Git Repo - linux.git/log

]> Git Repo - linux.git/log

projects / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Akinobu Mita [Sun, 26 Feb 2023 12:42:54 +0000 (21:42 +0900)]

nvme-tcp: don't access released socket during error recovery

While the error recovery work is temporarily failing reconnect attempts,
running the 'nvme list' command causes a kernel NULL pointer dereference
by calling getsockname() with a released socket.

During error recovery work, the nvme tcp socket is released and a new one
created, so it is not safe to access the socket without proper check.

Signed-off-by: Akinobu Mita <[email protected]>
Fixes: 02c57a82c008 ("nvme-tcp: print actual source IP address through sysfs "address" attr")
Reviewed-by: Martin Belanger <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Dan Carpenter [Thu, 16 Feb 2023 12:14:49 +0000 (15:14 +0300)]

nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge()

This function was transitioned from returning NVMe status codes to
returning traditional kernel error codes. However, this particular
return now accidentally returns positive error codes like ENOMEM instead
of negative -ENOMEM.

Fixes: b0ef1b11d390 ("nvme-auth: don't use NVMe status codes")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 21 Feb 2023 22:02:25 +0000 (14:02 -0800)]

nvme: bring back auto-removal of deleted namespaces during sequential scan

Bring back the check of the Identify Namespace return value for the
legacy NVMe 1.0-style sequential scanning. While NVMe 1.0 does not
support namespace management, there are "modern" cloud solutions like
Google Cloud Platform that claim the obsolete 1.0 compliance for no
good reason while supporting proprietary sideband namespace management.

Fixes: 1a893c2bfef4 ("nvme: refactor namespace probing")
Reported-by: Nils Hanke <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Tested-by: Nils Hanke <[email protected]>

commit | commitdiff | tree

Keith Busch [Fri, 24 Feb 2023 15:34:24 +0000 (07:34 -0800)]

nvme: fix sparse warning on effects masking

The log entries are stored in le32, so use appropriate byte swapping
macros.

Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Jens Axboe [Fri, 24 Feb 2023 17:01:19 +0000 (10:01 -0700)]

block: be a bit more careful in checking for NULL bdev while polling

Wei reports a crash with an application using polled IO:

PGD 14265e067 P4D 14265e067 PUD 47ec50067 PMD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 21915 Comm: iocore_0 Kdump: loaded Tainted: G S                5.12.0-0_fbk12_clang_7346_g1bb6f2e7058f #1
Hardware name: Wiwynn Delta Lake MP T8/Delta Lake-Class2, BIOS Y3DLM08 04/10/2022
RIP: 0010:bio_poll+0x25/0x200
Code: 0f 1f 44 00 00 0f 1f 44 00 00 55 41 57 41 56 41 55 41 54 53 48 83 ec 28 65 48 8b 04 25 28 00 00 00 48 89 44 24 20 48 8b 47 08 <48> 8b 80 70 02 00 00 4c 8b 70 50 8b 6f 34 31 db 83 fd ff 75 25 65
RSP: 0018:ffffc90005fafdf8 EFLAGS: 00010292
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 74b43cd65dd66600
RDX: 0000000000000003 RSI: ffffc90005fafe78 RDI: ffff8884b614e140
RBP: ffff88849964df78 R08: 0000000000000000 R09: 0000000000000008
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88849964df00
R13: ffffc90005fafe78 R14: ffff888137d3c378 R15: 0000000000000001
FS:  00007fd195000640(0000) GS:ffff88903f400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000270 CR3: 0000000466121001 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
iocb_bio_iopoll+0x1d/0x30
io_do_iopoll+0xac/0x250
__se_sys_io_uring_enter+0x3c5/0x5a0
? __x64_sys_write+0x89/0xd0
do_syscall_64+0x2d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x94f225d
Code: 24 cc 00 00 00 41 8b 84 24 d0 00 00 00 c1 e0 04 83 e0 10 41 09 c2 8b 33 8b 53 04 4c 8b 43 18 4c 63 4b 0c b8 aa 01 00 00 0f 05 <85> c0 0f 88 85 00 00 00 29 03 45 84 f6 0f 84 88 00 00 00 41 f6 c7
RSP: 002b:00007fd194ffcd88 EFLAGS: 00000202 ORIG_RAX: 00000000000001aa
RAX: ffffffffffffffda RBX: 00007fd194ffcdc0 RCX: 00000000094f225d
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007
RBP: 00007fd194ffcdb0 R08: 0000000000000000 R09: 0000000000000008
R10: 0000000000000001 R11: 0000000000000202 R12: 00007fd269d68030
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000

which is due to bio->bi_bdev being NULL. This can happen if we have two
tasks doing polled IO, and task B ends up completing IO from task A if
they are sharing a poll queue. If task B completes the IO and puts the
bio into our cache, then it can allocate that bio again before task A
is done polling for it. As that would necessitate a preempt between the
two tasks, it's enough to just be a bit more careful in checking for
whether or not bio->bi_bdev is NULL.

Reported-and-tested-by: Wei Zhang <[email protected]>
Cc: [email protected]
Fixes: be4d234d7aeb ("bio: add allocation cache abstraction")
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Jens Axboe [Fri, 24 Feb 2023 16:59:44 +0000 (09:59 -0700)]

block: clear bio->bi_bdev when putting a bio back in the cache

This isn't strictly needed in terms of correctness, but it does allow
polling to know if the bio has been put already by a different task
and hence avoid polling something that we don't need to.

Cc: [email protected]
Fixes: be4d234d7aeb ("bio: add allocation cache abstraction")
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Zhong Jinghua [Tue, 21 Feb 2023 09:50:27 +0000 (17:50 +0800)]

loop: loop_set_status_from_info() check before assignment

In loop_set_status_from_info(), lo->lo_offset and lo->lo_sizelimit should
be checked before reassignment, because if an overflow error occurs, the
original correct value will be changed to the wrong value, and it will not
be changed back.

More, the original patch did not solve the problem, the value was set and
ioctl returned an error, but the subsequent io used the value in the loop
driver, which still caused an alarm:

loop_handle_cmd
do_req_filebacked
  loff_t pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset;
  lo_rw_aio
   cmd->iocb.ki_pos = pos

Fixes: c490a0b5a4f3 ("loop: Check for overflow while configuring loop")
Signed-off-by: Zhong Jinghua <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Ming Lei [Mon, 20 Feb 2023 04:14:13 +0000 (12:14 +0800)]

ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd

sizeof(struct ublksrv_io_cmd) is 16bytes, which can be held in 64byte SQE,
so not necessary to check IO_URING_F_SQE128.

With this change, we get chance to save half SQ ring memory.

Fixed: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Juhyung Park [Fri, 3 Feb 2023 02:40:29 +0000 (11:40 +0900)]

block: remove more NULL checks after bdev_get_queue()

bdev_get_queue() never returns NULL. Several commits [1][2] have been made
before to remove such superfluous checks, but some still remained.

For places where bdev_get_queue() is called solely for NULL checks, it is
removed entirely.

[1] commit ec9fd2a13d74 ("blk-lib: don't check bdev_get_queue() NULL check")
[2] commit fea127b36c93 ("block: remove superfluous check for request queue in bdev_is_zoned()")

Signed-off-by: Juhyung Park <[email protected]>
Reviewed-by: Pankaj Raghav <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christophe JAILLET [Fri, 17 Feb 2023 09:29:10 +0000 (10:29 +0100)]

blk-mq: Reorder fields in 'struct blk_mq_tag_set'

Group some variables based on their sizes to reduce hole and avoid padding.
On x86_64, this shrinks the size of 'struct blk_mq_tag_set'
from 304 to 296 bytes.

Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/6f249f9b02a3490283ef0278096556de41aa0cf0.1676626130.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Yu Kuai [Fri, 17 Feb 2023 02:22:00 +0000 (10:22 +0800)]

block: fix scan partition for exclusively open device again

As explained in commit 36369f46e917 ("block: Do not reread partition table
on exclusively open device"), reread partition on the device that is
exclusively opened by someone else is problematic.

This patch will make sure partition scan will only be proceed if current
thread open the device exclusively, or the device is not opened
exclusively, and in the later case, other scanners and exclusive openers
will be blocked temporarily until partition scan is done.

Fixes: 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions")
Cc: <[email protected]>
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Yu Kuai [Fri, 17 Feb 2023 02:21:59 +0000 (10:21 +0800)]

block: Revert "block: Do not reread partition table on exclusively open device"

This reverts commit 36369f46e91785688a5f39d7a5590e3f07981316.

This patch can't fix the problem in a corner case that device can be
opened exclusively after the checking and before blkdev_get_by_dev().
We'll use a new solution to fix the problem in the next patch, and
the new solution doesn't need to change apis.

Signed-off-by: Yu Kuai <[email protected]>
Acked-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Luca Boccassi [Fri, 10 Feb 2023 01:06:12 +0000 (01:06 +0000)]

sed-opal: add support flag for SUM in status ioctl

Not every OPAL drive supports SUM (Single User Mode), so report this
information to userspace via the get-status ioctl so that we can adjust
the formatting options accordingly.
Tested on a kingston drive (which supports it) and a samsung one
(which does not).

Signed-off-by: Luca Boccassi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Pankaj Raghav [Fri, 17 Feb 2023 12:14:44 +0000 (17:44 +0530)]

brd: use radix_tree_maybe_preload instead of radix_tree_preload

Unconditionally calling radix_tree_preload_end() results in a OOPS
message as the preload is only conditionally called for
gfpflags_allow_blocking().

[   20.267323] BUG: using smp_processor_id() in preemptible [00000000] code: fio/416
[   20.267837] caller is brd_insert_page.part.0+0xbe/0x190 [brd]
[   20.269436] Call Trace:
[   20.269598]  <TASK>
[   20.269742]  dump_stack_lvl+0x32/0x50
[   20.269982]  check_preemption_disabled+0xd1/0xe0
[   20.270289]  brd_insert_page.part.0+0xbe/0x190 [brd]
[   20.270664]  brd_submit_bio+0x33f/0xf40 [brd]

Use radix_tree_maybe_preload() which does preload only if
gfpflags_allow_blocking() is true but also takes the lock. Therefore,
unconditionally calling radix_tree_preload_end() should not create any
issues and the message disappears.

Fixes: 6ded703c56c2 ("brd: check for REQ_NOWAIT and set correct page allocation mask")
Signed-off-by: Pankaj Raghav <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Jens Axboe [Fri, 17 Feb 2023 02:39:15 +0000 (19:39 -0700)]

block: use proper return value from bio_failfast()

kernel test robot complains about a type mismatch:

   block/blk-merge.c:984:42: sparse:     expected restricted blk_opf_t const [usertype] ff
   block/blk-merge.c:984:42: sparse:     got unsigned int
   block/blk-merge.c:1010:42: sparse: sparse: incorrect type in initializer (different base types) @@     expected restricted blk_opf_t const [usertype] ff @@     got unsigned int @@
   block/blk-merge.c:1010:42: sparse:     expected restricted blk_opf_t const [usertype] ff
   block/blk-merge.c:1010:42: sparse:     got unsigned int

because bio_failfast() is return an unsigned int rather than the
appropriate blk_opt_f type. Fix it up.

Fixes: 3ce6a115980c ("block: sync mixed merged request's failfast with 1st bio's")
Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Martin K. Petersen [Wed, 15 Feb 2023 17:18:01 +0000 (12:18 -0500)]

block: bio-integrity: Copy flags when bio_integrity_payload is cloned

Make sure to copy the flags when a bio_integrity_payload is cloned.
Otherwise per-I/O properties such as IP checksum flag will not be
passed down to the HBA driver. Since the integrity buffer is owned by
the original bio, the BIP_BLOCK_INTEGRITY flag needs to be masked off
to avoid a double free in the completion path.

Fixes: aae7df50190a ("block: Integrity checksum flag")
Fixes: b1f01388574c ("block: Relocate bio integrity flags")
Reported-by: Saurav Kashyap <[email protected]>
Tested-by: Saurav Kashyap <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Jinke Han [Thu, 16 Feb 2023 03:22:50 +0000 (11:22 +0800)]

block: Fix io statistics for cgroup in throttle path

In the current code, io statistics are missing for cgroup when bio
was throttled by blk-throttle. Fix it by moving the unreaching code
to submit_bio_noacct_nocheck.

Fixes: 3f98c753717c ("block: don't check bio in blk_throtl_dispatch_work_fn")
Signed-off-by: Jinke Han <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Acked-by: Muchun Song <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Jens Axboe [Wed, 15 Feb 2023 23:43:47 +0000 (16:43 -0700)]

brd: mark as nowait compatible

By default, non-mq drivers do not support nowait. This causes io_uring
to use a slower path as the driver cannot be trust not to block. brd
can safely set the nowait flag, as worst case all it does is a NOIO
allocation.

For io_uring, this makes a substantial difference. Before:

submitter=0, tid=453, file=/dev/ram0, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=440.03K, BW=1718MiB/s, IOS/call=32/31
IOPS=428.96K, BW=1675MiB/s, IOS/call=32/32
IOPS=442.59K, BW=1728MiB/s, IOS/call=32/31
IOPS=419.65K, BW=1639MiB/s, IOS/call=32/32
IOPS=426.82K, BW=1667MiB/s, IOS/call=32/31

and after:

submitter=0, tid=354, file=/dev/ram0, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=3.37M, BW=13.15GiB/s, IOS/call=32/31
IOPS=3.45M, BW=13.46GiB/s, IOS/call=32/31
IOPS=3.43M, BW=13.42GiB/s, IOS/call=32/32
IOPS=3.43M, BW=13.39GiB/s, IOS/call=32/31
IOPS=3.43M, BW=13.38GiB/s, IOS/call=32/31

or about an 8x in difference. Now that brd is prepared to deal with
REQ_NOWAIT reads/writes, mark it as supporting that.

Cc: [email protected] # 5.10+
Link: https://lore.kernel.org/linux-block/[email protected]/
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Jens Axboe [Thu, 16 Feb 2023 15:01:08 +0000 (08:01 -0700)]

brd: check for REQ_NOWAIT and set correct page allocation mask

If REQ_NOWAIT is set, then do a non-blocking allocation if the operation
is a write and we need to insert a new page. Currently REQ_NOWAIT cannot
be set as the queue isn't marked as supporting nowait, this change is in
preparation for allowing that.

radix_tree_preload() warns on attempting to call it with an allocation
mask that doesn't allow blocking. While that warning could arguably
be removed, we need to handle radix insertion failures anyway as they
are more likely if we cannot block to get memory.

Remove legacy BUG_ON()'s and turn them into proper errors instead, one
for the allocation failure and one for finding a page that doesn't
match the correct index.

Cc: [email protected] # 5.10+
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Jens Axboe [Thu, 16 Feb 2023 14:57:32 +0000 (07:57 -0700)]

brd: return 0/-error from brd_insert_page()

It currently returns a page, but callers just check for NULL/page to
gauge success. Clean this up and return the appropriate error directly
instead.

Cc: [email protected] # 5.10+
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Ming Lei [Thu, 9 Feb 2023 12:55:27 +0000 (20:55 +0800)]

block: sync mixed merged request's failfast with 1st bio's

We support mixed merge for requests/bios with different fastfail
settings. When request fails, each time we only handle the portion
with same failfast setting, then bios with failfast can be failed
immediately, and bios without failfast can be retried.

The idea is pretty good, but the current implementation has several
defects:

1) initially RA bio doesn't set failfast, however bio merge code
doesn't consider this point, and just check its failfast setting for
deciding if mixed merge is required. Fix this issue by adding helper
of bio_failfast().

2) when merging bio to request front, if this request is mixed
merged, we have to sync request's faifast setting with 1st bio's
failfast. Fix it by calling blk_update_mixed_merge().

3) when merging bio to request back, if this request is mixed
merged, we have to mark the bio as failfast, because blk_update_request
simply updates request failfast with 1st bio's failfast. Fix
it by calling blk_update_mixed_merge().

Fixes one normal EXT4 READ IO failure issue, because it is observed
that the normal READ IO is merged with RA IO, and the mixed merged
request has different failfast setting with 1st bio's, so finally
the normal READ IO doesn't get retried.

Cc: Tejun Heo <[email protected]>
Fixes: 80a761fd33cf ("block: implement mixed merge of different failfast requests")
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Jens Axboe [Wed, 15 Feb 2023 20:47:57 +0000 (13:47 -0700)]

Merge tag 'nvme-6.3-2023-02-15' of git://git.infradead.org/nvme into for-6.3/block

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.3

- fix and cleanup freeing single sgl (Keith Busch)"

* tag 'nvme-6.3-2023-02-15' of git://git.infradead.org/nvme:
nvme-pci: remove iod use_sgls
nvme-pci: fix freeing single sgl

commit | commitdiff | tree

Christoph Hellwig [Tue, 14 Feb 2023 18:33:08 +0000 (19:33 +0100)]

Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"

This reverts commit 84d7d462b16dd5f0bf7c7ca9254bf81db2c952a2.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 14 Feb 2023 18:33:07 +0000 (19:33 +0100)]

Revert "blk-cgroup: pass a gendisk to blkg_lookup"

This reverts commit 821e840c08ad83736eced4037cdad864e95e2584.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 14 Feb 2023 18:33:06 +0000 (19:33 +0100)]

Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"

This reverts commit 178fa7d49815ea8001f43ade37a22072829fd8ab.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 14 Feb 2023 18:33:05 +0000 (19:33 +0100)]

Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"

This reverts commit c43332fe028c252a2a28e46be70a530f64fc3c9d as it is not
needed without moving to disk references in the blkg.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 14 Feb 2023 18:33:04 +0000 (19:33 +0100)]

Revert "blk-cgroup: move the cgroup information to struct gendisk"

This reverts commit 3f13ab7c80fdb0ada86a8e3e818960bc1ccbaa59 as a patch
it depends on caused a few problems.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Keith Busch [Fri, 10 Feb 2023 18:03:47 +0000 (10:03 -0800)]

nvme-pci: remove iod use_sgls

It's not used anywhere anymore, so remove it.

Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Keith Busch [Fri, 10 Feb 2023 18:03:46 +0000 (10:03 -0800)]

nvme-pci: fix freeing single sgl

There may only be a single DMA mapped entry from multiple physical
segments, which means we don't allocate a separte SGL list. Check the
number of allocations prior to know if we need to free something.

Freeing a single list allocation is the same for both PRP and SGL
usages, so we don't need to check the use_sgl flag anymore.

Fixes: 01df742d8c5c0 ("nvme-pci: remove SGL segment descriptors")
Reported-by: Niklas Schnelle <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Niklas Schnelle <[email protected]>

commit | commitdiff | tree

Liu Xiaodong [Fri, 10 Feb 2023 14:13:56 +0000 (09:13 -0500)]

block: ublk: check IO buffer based on flag need_get_data

Currently, uring_cmd with UBLK_IO_FETCH_REQ or
UBLK_IO_COMMIT_AND_FETCH_REQ is always checked whether
userspace server has provided IO buffer even flag
UBLK_F_NEED_GET_DATA is configured.

This is a excessive check. If UBLK_F_NEED_GET_DATA is
configured, FETCH_RQ doesn't need to provide IO buffer;
COMMIT_AND_FETCH_REQ also doesn't need to do that if
the IO type is not READ.

Check ub_cmd->addr together with ublk_need_get_data()
and IO type in ublk_ch_uring_cmd().

With this fix, userspace server doesn't need to preserve
buffers for every ublk_io when flag UBLK_F_NEED_GET_DATA
is configured, in order to save memory.

Signed-off-by: Liu Xiaodong <[email protected]>
Fixes: c86019ff75c1 ("ublk_drv: add support for UBLK_IO_NEED_GET_DATA")
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Qiheng Lin [Fri, 10 Feb 2023 00:02:53 +0000 (01:02 +0100)]

s390/dasd: Fix potential memleak in dasd_eckd_init()

`dasd_reserve_req` is allocated before `dasd_vol_info_req`, and it
also needs to be freed before the error returns, just like the other
cases in this function.

Fixes: 9e12e54c7a8f ("s390/dasd: Handle out-of-space constraint")
Signed-off-by: Qiheng Lin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stefan Haberland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Alexander Gordeev [Fri, 10 Feb 2023 00:02:52 +0000 (01:02 +0100)]

s390/dasd: sort out physical vs virtual pointers usage

This does not fix a real bug, since virtual addresses
are currently indentical to physical ones.

Signed-off-by: Alexander Gordeev <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Bart Van Assche [Thu, 9 Feb 2023 23:01:35 +0000 (15:01 -0800)]

block: Remove the ALLOC_CACHE_SLACK constant

Commit b99182c501c3 ("bio: add pcpu caching for non-polling bio_put")
removed the code that uses this constant. Hence also remove the constant
itself.

Cc: Pavel Begunkov <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Thomas Weißschuh [Wed, 8 Feb 2023 04:01:22 +0000 (04:01 +0000)]

block: make kobj_type structures constant

Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Xiao Ni [Thu, 9 Feb 2023 03:19:30 +0000 (11:19 +0800)]

block: Merge bio before checking ->cached_rq

It checks if plug->cached_rq is empty before merging bio. But the merge action
doesn't have relationship with plug->cached_rq, it trys to merge bio with
requests within plug->mq_list. Now it checks if ->cached_rq is empty before
merging bio. If it's empty, it will miss the merge chances. So move the merge
function before checking ->cached_rq.

Signed-off-by: Xiao Ni <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Thu, 9 Feb 2023 05:35:23 +0000 (06:35 +0100)]

Revert "blk-cgroup: simplify blkg freeing from initialization failure paths"

It turns out this was too soon. blkg_conf_prep does to funky locking games
with the queue lock for this to work properly.

This reverts commit 27b642b07a4a5eb44dffa94a5171ce468bdc46f9.

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 8 Feb 2023 06:35:14 +0000 (07:35 +0100)]

blk-cgroup: delay calling blkcg_exit_disk until disk_release

While del_gendisk ensures there is no outstanding I/O on the queue,
it can't prevent block layer users from building new I/O.

This leads to a NULL ->root_blkg reference in bio_associate_blkg when
allocating a new bio on a shut down file system. Delay freeing the
blk-cgroup subsystems from del_gendisk until disk_release to make
sure the blkg and throttle information is still avaіlable for bio
submitters, even if those bios will immediately fail.

This now can cause a case where disk_release is called on a disk
that hasn't been added. That's mostly harmless, except for a case
in blk_throttl_exit that now needs to check for a NULL ->td pointer.

Fixes: 178fa7d49815 ("blk-cgroup: delay blk-cgroup initialization until add_disk")
Reported-by: Ming Lei <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Jens Axboe [Wed, 8 Feb 2023 23:59:35 +0000 (16:59 -0700)]

Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.3/block

Pull MD fix from Song:

"This commit fixes a rare crash during the takeover process."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: account io_acct_set usage with active_io

commit | commitdiff | tree

Xiao Ni [Fri, 3 Feb 2023 05:13:44 +0000 (13:13 +0800)]

md: account io_acct_set usage with active_io

io_acct_set was enabled for raid0/raid5 io accounting. bios that contain
md_io_acct are allocated in the i/o path. There isn't a good method to
monitor if these bios are all finished and freed. In the takeover process,
io_acct_set (which is used for bios with md_io_acct) need to be freed.
However, if some bios finish after io_acct_set is freed, it may trigger
the following panic:

[ 6973.767999] RIP: 0010:mempool_free+0x52/0x80
[ 6973.786098] Call Trace:
[ 6973.786549]  md_end_io_acct+0x31/0x40
[ 6973.787227]  blk_update_request+0x224/0x380
[ 6973.787994]  blk_mq_end_request+0x1a/0x130
[ 6973.788739]  blk_complete_reqs+0x35/0x50
[ 6973.789456]  __do_softirq+0xd7/0x2c8
[ 6973.790114]  ? sort_range+0x20/0x20
[ 6973.790763]  run_ksoftirqd+0x2a/0x40
[ 6973.791400]  smpboot_thread_fn+0xb5/0x150
[ 6973.792114]  kthread+0x10b/0x130
[ 6973.792724]  ? set_kthread_struct+0x50/0x50
[ 6973.793491]  ret_from_fork+0x1f/0x40

Fix this by increasing and decreasing active_io for each bio with
md_io_acct so that mddev_suspend() will wait until all bios from
io_acct_set finish before freeing io_acct_set.

Reported-by: Fine Fan <[email protected]>
Signed-off-by: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>

commit | commitdiff | tree

Ming Lei [Tue, 7 Feb 2023 15:07:00 +0000 (23:07 +0800)]

block: ublk: improve handling device deletion

Inside ublk_ctrl_del_dev(), when the device is removed, we wait
until the device number is freed with holding global lock of
ublk_ctl_mutex, this way isn't friendly from user viewpoint:

1) if device is in-use, the current delete command hangs in
ublk_ctrl_del_dev(), and user can't break from the handling
because wait_event() is used

2) global lock is held, so any new device can't be added and
other old devices can't be removed.

Improve the deleting handling by the following way, suggested by
Nadav:

1) wait without holding the global lock

2) replace wait_event() with wait_event_interruptible()

Reported-by: Nadav Amit <[email protected]>
Suggested-by: Nadav Amit <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Yu Kuai [Thu, 2 Feb 2023 13:49:13 +0000 (21:49 +0800)]

block, bfq: cleanup 'bfqg->online'

After commit dfd6200a0954 ("blk-cgroup: support to track if policy is
online"), there is no need to do this again in bfq.

However, 'pd->online' is not protected by 'bfqd->lock', in order to make
sure bfq won't see that 'pd->online' is still set after bfq_pd_offline(),
clear it before bfq_pd_offline() is called. This is fine because other
polices doesn't use 'pd->online' and bfq_pd_offline() will move active
bfqq to root cgroup anyway.

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Jens Axboe [Tue, 7 Feb 2023 14:22:58 +0000 (07:22 -0700)]

Merge tag 'nvme-6.3-2023-02-07' of git://git.infradead.org/nvme into for-6.3/block

Pull NVMe updates from Christoph:

"nvme updates for Linux 6.3

- small improvements to the logging functionality (Amit Engel)
- authentication cleanups (Hannes Reinecke)
- cleanup and optimize the DMA mapping cod in the PCIe driver
   (Keith Busch)
- work around the command effects for Format NVM (Keith Busch)
- misc cleanups (Keith Busch, Christoph Hellwig)"

* tag 'nvme-6.3-2023-02-07' of git://git.infradead.org/nvme:
  nvme: mask CSE effects for security receive
  nvme: always initialize known command effects
  nvmet: for nvme admin set_features cmd, call nvmet_check_data_len_lte()
  nvme-tcp: add additional info for nvme_tcp_timeout log
  nvme: add nvme_opcode_str function for all nvme cmd types
  nvme: remove nvme_execute_passthru_rq
  nvme-pci: place descriptor addresses in iod
  nvme-pci: use mapped entries for sgl decision
  nvme-pci: remove SGL segment descriptors
  nvme-auth: don't use NVMe status codes
  nvme-fabrics: clarify AUTHREQ result handling

commit | commitdiff | tree

Ziyang Zhang [Tue, 7 Feb 2023 07:08:39 +0000 (15:08 +0800)]

ublk: pass NULL to blk_mq_alloc_disk() as queuedata

queuedata is not referenced in ublk_drv and we can use driver_data
instead. Pass NULL to blk_mq_alloc_disk() as queuedata while allocating
ublk's gendisk.

Signed-off-by: Ziyang Zhang <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Ziyang Zhang [Tue, 7 Feb 2023 07:08:38 +0000 (15:08 +0800)]

ublk: mention WRITE_ZEROES in comment of ublk_complete_rq()

WRITE_ZEROES won't return bytes returned just like FLUSH and DISCARD,
and we can end it directly. Add missing comment for it in
ublk_complete_rq().

Signed-off-by: Ziyang Zhang <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Ziyang Zhang [Tue, 7 Feb 2023 07:08:37 +0000 (15:08 +0800)]

ublk: remove unnecessary NULL check in ublk_rq_has_data()

bio_has_data() allows a NULL bio so the NULL check in
ublk_rq_has_data() is unnecessary.

Signed-off-by: Ziyang Zhang <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Greg Kroah-Hartman [Thu, 2 Feb 2023 14:19:56 +0000 (15:19 +0100)]

trace/blktrace: fix memory leak with using debugfs_lookup()

When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time. To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.

Cc: Jens Axboe <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:26 +0000 (17:37 +0800)]

blk-mq: correct stale comment of .get_budget

Commit 88022d7201e96 ("blk-mq: don't handle failure in .get_budget")
remove BLK_STS_RESOURCE return value and we only check if we can get
the budget from .get_budget() now.
Correct stale comment that ".get_budget() returns BLK_STS_NO_RESOURCE"
to ".get_budget() fails to get the budget".

Fixes: 88022d7201e9 ("blk-mq: don't handle failure in .get_budget")
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:25 +0000 (17:37 +0800)]

blk-mq: use switch/case to improve readability in blk_mq_try_issue_list_directly

Use switch/case handle error as other function do to improve
readability in blk_mq_try_issue_list_directly.

Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:24 +0000 (17:37 +0800)]

blk-mq: remove set of bd->last when get driver tag for next request fails

Commit 113285b473824 ("blk-mq: ensure that bd->last is always set
correctly") will set last if we failed to get driver tag for next
request to avoid flush miss as we break the list walk and will not
send the last request in the list which will be sent with last set
normally.
This code seems stale now becase the flush introduced is always
redundant as:
For case tag is really out, we will send a extra flush if we find
list is not empty after list walk.
For case some tag is freed before retry in blk_mq_prep_dispatch_rq for
next, then we can get a tag for next request in retry and flush notified
already is not necessary.

Just remove these stale codes.

Signed-off-by: Kemeng Shi <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:23 +0000 (17:37 +0800)]

blk-mq: remove unnecessary error count and check in blk_mq_dispatch_rq_list

blk_mq_dispatch_rq_list will notify if hctx is busy in return bool. It will
return true if we are not busy and can handle more and return false on the
opposite. Inside blk_mq_dispatch_rq_list, errors is only used if list is
empty and we will return true if list is empty and (errors + queued) != 0.

There are three types of status returned from request:
-busy error BLK_STS*_RESOURCE: the failed request will be added back
to list and list will not be empty.
-BLK_STS_OK: We count queued for BLK_STS_OK
-rest error: We count errors for rest error

If list is empty, there is no request gets busy error then (errors +
queued) will be total requests in the list which is checked not empty at
beginning of blk_mq_dispatch_rq_list. So (errors + queued) != 0 is always
met if list is empty. Then the (errors + queued) != 0 check and errors
number count is not needed.

Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:22 +0000 (17:37 +0800)]

blk-mq: simplify flush check in blk_mq_dispatch_rq_list

1. Remove check of needs_resource and ret == BLK_STS_DEV_RESOURCE.
For busy error BLK_STS*_RESOURCE, request will always be added
back to list, so need_resource will not be true and ret will
not be == BLK_STS_DEV_RESOURCE if list is empty. We could remove
these dead check.

2. Check ret of last request instead of errors
If list is empty, we only need to explicitly commit_rqs
if error happens at last request which is stored in ret. So check
ret of last request instead of errors to remove unnecessary
commit_rqs triggered by errors returned from previous request.

Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:21 +0000 (17:37 +0800)]

blk-mq: use blk_mq_commit_rqs helper in blk_mq_try_issue_list_directly

Call blk_mq_commit_rqs instead of access ->commit_rqs directly. As you
can see in comment of blk_mq_commit_rqs, we only need explicitly call
this in two cases:
-did not queue everything initially scheduled to queue
-the last attempt to queue a request failed
Both cases can be checked with ret of last request which breaks list
walk. Then we can remove unnecessary error count and unnecessary
commit triggered by error besides cases described above.

Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:20 +0000 (17:37 +0800)]

blk-mq: remove unncessary error count and commit in blk_mq_plug_issue_direct

We need only to explicitly commit in two error cases:
-did not queue everything initially scheduled to queue
-the last attempt to queue a request failed
(see comment of blk_mq_commit_rqs for more details).
Both cases can be checked with ret of last request which breaks list walk.
Remove unnecessary error count and unnecessary commit triggered by error
which is not covered by cases described above.

Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:19 +0000 (17:37 +0800)]

blk-mq: make blk_mq_commit_rqs a general function for all commits

1. move blk_mq_commit_rqs forward before functions need commits.
2. add queued check and only commits request if any request was queued
in blk_mq_commit_rqs to keep commit behavior consistent and remove
unnecessary commit.
3. split the queued clearing from blk_mq_plug_commit_rqs as it is
not wanted general.
4. sync current caller of blk_mq_commit_rqs with new general
blk_mq_commit_rqs.
5. document rule for unusual cases which need explicit commit_rqs.

Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:18 +0000 (17:37 +0800)]

blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_direct

Function blk_mq_plug_issue_direct tries to issue batch requests in plug
list to driver directly. We will only issue plug request to driver if we
are not from scheduler, so from_scheduler parameter of
blk_mq_plug_issue_direct is always false.
Remove unncessary from_scheduler of blk_mq_plug_issue_direct.

Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:17 +0000 (17:37 +0800)]

blk-mq: remove unnecessary list_empty check in blk_mq_try_issue_list_directly

We only break the list walk if we get 'BLK_STS_*RESOURCE'. We also
count errors for 'BLK_STS_*RESOURCE' error. If list is not empty,
errors will always be non-zero. So we can remove unnecessary list_empty
check. This will remove redundant list_empty check for case that
error happened at sending last request in list.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:16 +0000 (17:37 +0800)]

blk-mq: Fix potential io hung for shared sbitmap per tagset

Commit f906a6a0f4268 ("blk-mq: improve tag waiting setup for non-shared
tags") mark restart for unshared tags for improvement. At that time,
tags is only shared betweens queues and we can check if tags is shared
by test BLK_MQ_F_TAG_SHARED.
Afterwards, commit 32bc15afed04b ("blk-mq: Facilitate a shared sbitmap per
tagset") enabled tags share betweens hctxs inside a queue. We only
mark restart for shared hctxs inside a queue and may cause io hung if
there is no tag currently allocated by hctxs going to be marked restart.
Wait on sbitmap_queue instead of mark restart for shared hctxs case to
fix this.

Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:15 +0000 (17:37 +0800)]

blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait

For shared queues case, we will only wait on bitmap_tags if we fail to get
driver tag. However, rq could be from breserved_tags, then two problems
will occur:
1. io hung if no tag is currently allocated from bitmap_tags.
2. unnecessary wakeup when tag is freed to bitmap_tags while no tag is
freed to breserved_tags.
Wait on the bitmap which rq from to fix this.

Fixes: f906a6a0f426 ("blk-mq: improve tag waiting setup for non-shared tags")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:14 +0000 (17:37 +0800)]

blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx

Commit 97889f9ac24f8 ("blk-mq: remove synchronize_rcu() from
blk_mq_del_queue_tag_set()") remove handle of TAG_SHARED in restart,
then shared_hctx_restart counted for how many hardware queues are marked
for restart is removed too.
Remove the stale comment that we still count hardware queues need restart.

Fixes: 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Kemeng Shi [Wed, 18 Jan 2023 09:37:13 +0000 (17:37 +0800)]

blk-mq: avoid sleep in blk_mq_alloc_request_hctx

Commit 1f5bd336b9150 ("blk-mq: add blk_mq_alloc_request_hctx") add
blk_mq_alloc_request_hctx to send commands to a specific queue. If
BLK_MQ_REQ_NOWAIT is not set in tag allocation, we may change to different
hctx after sleep and get tag from unexpected hctx. So BLK_MQ_REQ_NOWAIT
must be set in flags for blk_mq_alloc_request_hctx.
After commit 600c3b0cea784 ("blk-mq: open code __blk_mq_alloc_request in
blk_mq_alloc_request_hctx"), blk_mq_alloc_request_hctx return -EINVAL
if both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED are not set instead of
if BLK_MQ_REQ_NOWAIT is not set. So if BLK_MQ_REQ_NOWAIT is not set and
BLK_MQ_REQ_RESERVED is set, blk_mq_alloc_request_hctx could alloc tag
from unexpected hctx. I guess what we need here is that return -EINVAL
if either BLK_MQ_REQ_NOWAIT or BLK_MQ_REQ_RESERVED is not set.

Currently both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED will be set if
specific hctx is needed in nvme_auth_submit, nvmf_connect_io_queue
and nvmf_connect_admin_queue. Fix the potential BLK_MQ_REQ_NOWAIT missed
case in future.

Fixes: 600c3b0cea78 ("blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:02:09 +0000 (16:02 +0100)]

block: stub out and deprecated the capability attribute on the gendisk

The capability attribute was added in 2017 to expose the kernel internal
GENHD_FL_MEDIA_CHANGE_NOTIFY to userspace without ever adding a value to
an UAPI header, and without ever setting it in any driver until it was
finally removed in Linux 5.7.

Deprecate the file and always return 0 instead of exposing the other
internal and frequently renumbered other gendisk flags.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Mon, 6 Feb 2023 15:02:01 +0000 (16:02 +0100)]

blk-cgroup: fix freeing NULL blkg in blkg_create

new_blkg can be NULL if the caller didn't pass in a pre-allocated blkg.
Don't try to free it in that case.

Fixes: 27b642b07a4a ("blk-cgroup: simplify blkg freeing from initialization failure paths")
Reported-by: Yi Zhang <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:34 +0000 (16:06 +0100)]

libceph: use bvec_set_page to initialize bvecs

Use the bvec_set_page helper to initialize bvecs.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:33 +0000 (16:06 +0100)]

vringh: use bvec_set_page to initialize a bvec

Use the bvec_set_page helper to initialize a bvec.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:32 +0000 (16:06 +0100)]

sunrpc: use bvec_set_page to initialize bvecs

Use the bvec_set_page helper to initialize bvecs.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Chuck Lever <[email protected]>
Acked-by: Trond Myklebust <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:31 +0000 (16:06 +0100)]

rxrpc: use bvec_set_page to initialize a bvec

Use the bvec_set_page helper to initialize a bvec.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:30 +0000 (16:06 +0100)]

swap: use bvec_set_page to initialize bvecs

Use the bvec_set_page helper to initialize bvecs.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:29 +0000 (16:06 +0100)]

io_uring: use bvec_set_page to initialize a bvec

Use the bvec_set_page helper to initialize a bvec.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:28 +0000 (16:06 +0100)]

splice: use bvec_set_page to initialize a bvec

Use the bvec_set_page helper to initialize a bvec.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:27 +0000 (16:06 +0100)]

orangefs: use bvec_set_{page,folio} to initialize bvecs

Use the bvec_set_page and bvec_set_folio helpers to initialize bvecs.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:26 +0000 (16:06 +0100)]

nfs: use bvec_set_page to initialize bvecs

Use the bvec_set_page helper to initialize bvecs.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Trond Myklebust <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:25 +0000 (16:06 +0100)]

coredump: use bvec_set_page to initialize a bvec

Use the bvec_set_page helper to initialize a bvec.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:24 +0000 (16:06 +0100)]

cifs: use bvec_set_page to initialize bvecs

Use the bvec_set_page helper to initialize bvecs.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Paulo Alcantara (SUSE) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:23 +0000 (16:06 +0100)]

ceph: use bvec_set_page to initialize a bvec

Use the bvec_set_page helper to initialize a bvec.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:22 +0000 (16:06 +0100)]

afs: use bvec_set_folio to initialize a bvec

Use the bvec_set_folio helper to initialize a bvec.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:21 +0000 (16:06 +0100)]

zram: use bvec_set_page to initialize bvecs

Use the bvec_set_page helper to initialize bvecs.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:20 +0000 (16:06 +0100)]

virtio_blk: use bvec_set_virt to initialize special_vec

Use the bvec_set_virt helper to initialize the special_vec.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:19 +0000 (16:06 +0100)]

rbd: use bvec_set_page to initialize the copy up bvec

Use the bvec_set_page helper to initialize the copy up bvec.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:18 +0000 (16:06 +0100)]

nvme: use bvec_set_virt to initialize special_vec

Use the bvec_set_virt helper to initialize the special_vec.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:17 +0000 (16:06 +0100)]

nvmet: use bvec_set_page to initialize bvecs

Use the bvec_set_page helper to initialize bvecs.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:16 +0000 (16:06 +0100)]

target: use bvec_set_page to initialize bvecs

Use the bvec_set_page helper to initialize bvecs.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:15 +0000 (16:06 +0100)]

sd: factor out a sd_set_special_bvec helper

Add a helper for setting up the special_bvec instead of open coding it
in three place, and use the new bvec_set_page helper to initialize
special_vec.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:14 +0000 (16:06 +0100)]

block: add a bvec_set_virt helper

A small wrapper around bvec_set_page for callers that have a virtual
address.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:13 +0000 (16:06 +0100)]

block: add a bvec_set_folio helper

A smaller wrapper around bvec_set_page that takes a folio instead.
There are only two potential users for this in the tree, but the number
will grow in the future.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:06:12 +0000 (16:06 +0100)]

block: factor out a bvec_set_page helper

Add a helper to initialize a bvec based of a page pointer. This will help
removing various open code bvec initializations.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:04:00 +0000 (16:04 +0100)]

blk-cgroup: move the cgroup information to struct gendisk

cgroup information only makes sense on a live gendisk that allows
file system I/O (which includes the raw block device). So move over
the cgroup related members.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:59 +0000 (16:03 +0100)]

blk-cgroup: pass a gendisk to blkg_lookup

Pass a gendisk to blkg_lookup and use that to find the match as part
of phasing out usage of the request_queue in the blk-cgroup code.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:58 +0000 (16:03 +0100)]

blk-cgroup: pass a gendisk to pd_alloc_fn

No need to the request_queue here, pass a gendisk and extract the
node ids from that.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:57 +0000 (16:03 +0100)]

blk-cgroup: pass a gendisk to blkcg_{de,}activate_policy

Prepare for storing the blkcg information in the gendisk instead of
the request_queue.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:56 +0000 (16:03 +0100)]

blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos

This is what about half of the users already want, and it's only going to
grow more.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:55 +0000 (16:03 +0100)]

blk-rq-qos: constify rq_qos_ops

These op vectors are constant, so mark them const.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:54 +0000 (16:03 +0100)]

blk-rq-qos: make rq_qos_add and rq_qos_del more useful

Switch to passing a gendisk, and make rq_qos_add initialize all required
fields and drop the not required q argument from rq_qos_del.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:53 +0000 (16:03 +0100)]

blk-rq-qos: move rq_qos_add and rq_qos_del out of line

These two functions are rather larger and not in a fast path, so move
them out of line.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:52 +0000 (16:03 +0100)]

blk-wbt: open code wbt_queue_depth_changed in wbt_init

wbt_queue_depth_changed just updates a field and calls another function.
Open code it in wbt_init, so that the local queue variable can be used
instead of the one stored in the rq_qos. This will allow delaying that
rq_qos->queue assignment in a subsequent patch.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:51 +0000 (16:03 +0100)]

blk-wbt: move private information from blk-wbt.h to blk-wbt.c

A large part of blk-wbt.h is only used in blk-wbt.c, so move it there.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:50 +0000 (16:03 +0100)]

blk-wbt: pass a gendisk to wbt_init

Pass a gendisk to wbt_init to prepare for phasing out usage of the
request_queue in the blk-cgroup code.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:49 +0000 (16:03 +0100)]

blk-wbt: pass a gendisk to wbt_{enable,disable}_default

Pass a gendisk to wbt_enable_default and wbt_disable_default to
prepare for phasing out usage of the request_queue in the blk-cgroup
code.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:48 +0000 (16:03 +0100)]

blk-cgroup: store a gendisk to throttle in struct task_struct

Switch from a request_queue pointer and reference to a gendisk once
for the throttle information in struct task_struct.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:47 +0000 (16:03 +0100)]

blk-cgroup: pin the gendisk in struct blkcg_gq

Currently each blkcg_gq holds a request_queue reference, which is what
is used in the policies. But a lot of these interfaces will move over to
use a gendisk, so store a disk in struct blkcg_gq and hold a reference to
it.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Andreas Herrmann <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 3 Feb 2023 15:03:46 +0000 (16:03 +0100)]

blk-cgroup: remove the !bdi->dev check in blkg_dev_name

bdi_dev_name already performs the same check.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

Empty description

This page took 0.145398 seconds and 4 git commands to generate.