Git Repo - linux.git/log

]> Git Repo - linux.git/log

projects / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Darrick J. Wong [Tue, 13 Oct 2020 15:46:27 +0000 (08:46 -0700)]

xfs: fix high key handling in the rt allocator's query_range function

Fix some off-by-one errors in xfs_rtalloc_query_range.  The highest key
in the realtime bitmap is always one less than the number of rt extents,
which means that the key clamp at the start of the function is wrong.
The 4th argument to xfs_rtfind_forw is the highest rt extent that we
want to probe, which means that passing 1 less than the high key is
wrong.  Finally, drop the rem variable that controls the loop because we
can compare the iteration point (rtstart) against the high key directly.

The sordid history of this function is that the original commit (fb3c3)
incorrectly passed (high_rec->ar_startblock - 1) as the 'limit' parameter
to xfs_rtfind_forw.  This was wrong because the "high key" is supposed
to be the largest key for which the caller wants result rows, not the
key for the first row that could possibly be outside the range that the
caller wants to see.

A subsequent attempt (8ad56) to strengthen the parameter checking added
incorrect clamping of the parameters to the number of rt blocks in the
system (despite the bitmap functions all taking units of rt extents) to
avoid querying ranges past the end of rt bitmap file but failed to fix
the incorrect _rtfind_forw parameter.  The original _rtfind_forw
parameter error then survived the conversion of the startblock and
blockcount fields to rt extents (a0e5c), and the most recent off-by-one
fix (a3a37) thought it was patching a problem when the end of the rt
volume is not in use, but none of these fixes actually solved the
original problem that the author was confused about the "limit" argument
to xfs_rtfind_forw.

Sadly, all four of these patches were written by this author and even
his own usage of this function and rt testing were inadequate to get
this fixed quickly.

Original-problem: fb3c3de2f65c ("xfs: add a couple of queries to iterate free extents in the rtbitmap")
Not-fixed-by: 8ad560d2565e ("xfs: strengthen rtalloc query range checks")
Not-fixed-by: a0e5c435babd ("xfs: fix xfs_rtalloc_rec units")
Fixes: a3a374bf1889 ("xfs: fix off-by-one error in xfs_rtalloc_query_range")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Chandan Babu R <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Thu, 8 Oct 2020 14:53:33 +0000 (07:53 -0700)]

xfs: annotate grabbing the realtime bitmap/summary locks in growfs

Use XFS_ILOCK_RT{BITMAP,SUM} to annotate grabbing the rt bitmap and
summary locks when we grow the realtime volume, just like we do most
everywhere else.  This shuts up lockdep warnings about grabbing the
ILOCK class of locks recursively:

============================================
WARNING: possible recursive locking detected
5.9.0-rc4-djw #rc4 Tainted: G           O
--------------------------------------------
xfs_growfs/4841 is trying to acquire lock:
ffff888035acc230 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0xac/0x1a0 [xfs]

but task is already holding lock:
ffff888035acedb0 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0xac/0x1a0 [xfs]

other info that might help us debug this:
Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xfs_nondir_ilock_class);
  lock(&xfs_nondir_ilock_class);

*** DEADLOCK ***

May be due to missing lock nesting notation

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Chandan Babu R <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Oct 2020 20:57:52 +0000 (13:57 -0700)]

xfs: make xfs_growfs_rt update secondary superblocks

When we call growfs on the data device, we update the secondary
superblocks to reflect the updated filesystem geometry. We need to do
this for growfs on the realtime volume too, because a future xfs_repair
run could try to fix the filesystem using a backup superblock.

This was observed by the online superblock scrubbers while running
xfs/233. One can also trigger this by growing an rt volume, cycling the
mount, and creating new rt files.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Chandan Babu R <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Wed, 7 Oct 2020 20:55:16 +0000 (13:55 -0700)]

xfs: fix realtime bitmap/summary file truncation when growing rt volume

The realtime bitmap and summary files are regular files that are hidden
away from the directory tree.  Since they're regular files, inode
inactivation will try to purge what it thinks are speculative
preallocations beyond the incore size of the file.  Unfortunately,
xfs_growfs_rt forgets to update the incore size when it resizes the
inodes, with the result that inactivating the rt inodes at unmount time
will cause their contents to be truncated.

Fix this by updating the incore size when we change the ondisk size as
part of updating the superblock.  Note that we don't do this when we're
allocating blocks to the rt inodes because we actually want those blocks
to get purged if the growfs fails.

This fixes corruption complaints from the online rtsummary checker when
running xfs/233.  Since that test requires rmap, one can also trigger
this by growing an rt volume, cycling the mount, and creating rt files.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Chandan Babu R <[email protected]>

commit | commitdiff | tree

Kaixu Xia [Wed, 7 Oct 2020 00:50:15 +0000 (17:50 -0700)]

xfs: fix the indent in xfs_trans_mod_dquot

The formatting is strange in xfs_trans_mod_dquot, so do a reindent.

Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Kaixu Xia [Wed, 7 Oct 2020 00:50:14 +0000 (17:50 -0700)]

xfs: do the ASSERT for the arguments O_{u,g,p}dqpp

If we pass in XFS_QMOPT_{U,G,P}QUOTA flags and different uid/gid/prid
than them currently associated with the inode, the arguments
O_{u,g,p}dqpp shouldn't be NULL, so add the ASSERT for them.

Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Thu, 1 Oct 2020 17:56:07 +0000 (10:56 -0700)]

xfs: fix deadlock and streamline xfs_getfsmap performance

Refactor xfs_getfsmap to improve its performance: instead of indirectly
calling a function that copies one record to userspace at a time, create
a shadow buffer in the kernel and copy the whole array once at the end.
On the author's computer, this reduces the runtime on his /home by ~20%.

This also eliminates a deadlock when running GETFSMAP against the
realtime device. The current code locks the rtbitmap to create
fsmappings and copies them into userspace, having not released the
rtbitmap lock. If the userspace buffer is an mmap of a sparse file that
itself resides on the realtime device, the write page fault will recurse
into the fs for allocation, which will deadlock on the rtbitmap lock.

Fixes: 4c934c7dd60c ("xfs: report realtime space information via the rtbitmap")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chandan Babu R <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Thu, 1 Oct 2020 17:56:07 +0000 (10:56 -0700)]

xfs: limit entries returned when counting fsmap records

If userspace asked fsmap to count the number of entries, we cannot
return more than UINT_MAX entries because fmh_entries is u32.
Therefore, stop counting if we hit this limit or else we will waste time
to return truncated results.

Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chandan Babu R <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sat, 26 Sep 2020 00:39:58 +0000 (17:39 -0700)]

xfs: only relog deferred intent items if free space in the log gets low

Now that we have the ability to ask the log how far the tail needs to be
pushed to maintain its free space targets, augment the decision to relog
an intent item so that we only do it if the log has hit the 75% full
threshold. There's no point in relogging an intent into the same
checkpoint, and there's no need to relog if there's plenty of free space
in the log.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sat, 26 Sep 2020 00:39:51 +0000 (17:39 -0700)]

xfs: expose the log push threshold

Separate the computation of the log push threshold and the push logic in
xlog_grant_push_ail. This enables higher level code to determine (for
example) that it is holding on to a logged intent item and the log is so
busy that it is more than 75% full. In that case, it would be desirable
to move the log item towards the head to release the tail, which we will
cover in the next patch.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sun, 27 Sep 2020 23:18:13 +0000 (16:18 -0700)]

xfs: periodically relog deferred intent items

There's a subtle design flaw in the deferred log item code that can lead
to pinning the log tail.  Taking up the defer ops chain examples from
the previous commit, we can get trapped in sequences like this:

Caller hands us a transaction t0 with D0-D3 attached.  The defer ops
chain will look like the following if the transaction rolls succeed:

t1: D0(t0), D1(t0), D2(t0), D3(t0)
t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0)
t3: d5(t1), D1(t0), D2(t0), D3(t0)
...
t9: d9(t7), D3(t0)
t10: D3(t0)
t11: d10(t10), d11(t10)
t12: d11(t10)

In transaction 9, we finish d9 and try to roll to t10 while holding onto
an intent item for D3 that we logged in t0.

The previous commit changed the order in which we place new defer ops in
the defer ops processing chain to reduce the maximum chain length.  Now
make xfs_defer_finish_noroll capable of relogging the entire chain
periodically so that we can always move the log tail forward.  Most
chains will never get relogged, except for operations that generate very
long chains (large extents containing many blocks with different sharing
levels) or are on filesystems with small logs and a lot of ongoing
metadata updates.

Callers are now required to ensure that the transaction reservation is
large enough to handle logging done items and new intent items for the
maximum possible chain length.  Most callers are careful to keep the
chain lengths low, so the overhead should be minimal.

The decision to relog an intent item is made based on whether the intent
was logged in a previous checkpoint, since there's no point in relogging
an intent into the same checkpoint.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sat, 26 Sep 2020 00:39:51 +0000 (17:39 -0700)]

xfs: change the order in which child and parent defer ops are finished

The defer ops code has been finishing items in the wrong order -- if a
top level defer op creates items A and B, and finishing item A creates
more defer ops A1 and A2, we'll put the new items on the end of the
chain and process them in the order A B A1 A2.  This is kind of weird,
since it's convenient for programmers to be able to think of A and B as
an ordered sequence where all the sub-tasks for A must finish before we
move on to B, e.g. A A1 A2 D.

Right now, our log intent items are not so complex that this matters,
but this will become important for the atomic extent swapping patchset.
In order to maintain correct reference counting of extents, we have to
unmap and remap extents in that order, and we want to complete that work
before moving on to the next range that the user wants to swap.  This
patch fixes defer ops to satsify that requirement.

The primary symptom of the incorrect order was noticed in an early
performance analysis of the atomic extent swap code.  An astonishingly
large number of deferred work items accumulated when userspace requested
an atomic update of two very fragmented files.  The cause of this was
traced to the same ordering bug in the inner loop of
xfs_defer_finish_noroll.

If the ->finish_item method of a deferred operation queues new deferred
operations, those new deferred ops are appended to the tail of the
pending work list.  To illustrate, say that a caller creates a
transaction t0 with four deferred operations D0-D3.  The first thing
defer ops does is roll the transaction to t1, leaving us with:

t1: D0(t0), D1(t0), D2(t0), D3(t0)

Let's say that finishing each of D0-D3 will create two new deferred ops.
After finish D0 and roll, we'll have the following chain:

t2: D1(t0), D2(t0), D3(t0), d4(t1), d5(t1)

d4 and d5 were logged to t1.  Notice that while we're about to start
work on D1, we haven't actually completed all the work implied by D0
being finished.  So far we've been careful (or lucky) to structure the
dfops callers such that D1 doesn't depend on d4 or d5 being finished,
but this is a potential logic bomb.

There's a second problem lurking.  Let's see what happens as we finish
D1-D3:

t3: D2(t0), D3(t0), d4(t1), d5(t1), d6(t2), d7(t2)
t4: D3(t0), d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3)
t5: d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4)

Let's say that d4-d11 are simple work items that don't queue any other
operations, which means that we can complete each d4 and roll to t6:

t6: d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4)
t7: d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4)
...
t11: d10(t4), d11(t4)
t12: d11(t4)
<done>

When we try to roll to transaction #12, we're holding defer op d11,
which we logged way back in t4.  This means that the tail of the log is
pinned at t4.  If the log is very small or there are a lot of other
threads updating metadata, this means that we might have wrapped the log
and cannot get roll to t11 because there isn't enough space left before
we'd run into t4.

Let's shift back to the original failure.  I mentioned before that I
discovered this flaw while developing the atomic file update code.  In
that scenario, we have a defer op (D0) that finds a range of file blocks
to remap, creates a handful of new defer ops to do that, and then asks
to be continued with however much work remains.

So, D0 is the original swapext deferred op.  The first thing defer ops
does is rolls to t1:

t1: D0(t0)

We try to finish D0, logging d1 and d2 in the process, but can't get all
the work done.  We log a done item and a new intent item for the work
that D0 still has to do, and roll to t2:

t2: D0'(t1), d1(t1), d2(t1)

We roll and try to finish D0', but still can't get all the work done, so
we log a done item and a new intent item for it, requeue D0 a second
time, and roll to t3:

t3: D0''(t2), d1(t1), d2(t1), d3(t2), d4(t2)

If it takes 48 more rolls to complete D0, then we'll finally dispense
with D0 in t50:

t50: D<fifty primes>(t49), d1(t1), ..., d102(t50)

We then try to roll again to get a chain like this:

t51: d1(t1), d2(t1), ..., d101(t50), d102(t50)
...
t152: d102(t50)
<done>

Notice that in rolling to transaction #51, we're holding on to a log
intent item for d1 that was logged in transaction #1.  This means that
the tail of the log is pinned at t1.  If the log is very small or there
are a lot of other threads updating metadata, this means that we might
have wrapped the log and cannot roll to t51 because there isn't enough
space left before we'd run into t1.  This is of course problem #2 again.

But notice the third problem with this scenario: we have 102 defer ops
tied to this transaction!  Each of these items are backed by pinned
kernel memory, which means that we risk OOM if the chains get too long.

Yikes.  Problem #1 is a subtle logic bomb that could hit someone in the
future; problem #2 applies (rarely) to the current upstream, and problem
#3 applies to work under development.

This is not how incremental deferred operations were supposed to work.
The dfops design of logging in the same transaction an intent-done item
and a new intent item for the work remaining was to make it so that we
only have to juggle enough deferred work items to finish that one small
piece of work.  Deferred log item recovery will find that first
unfinished work item and restart it, no matter how many other intent
items might follow it in the log.  Therefore, it's ok to put the new
intents at the start of the dfops chain.

For the first example, the chains look like this:

t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0)
t3: d5(t1), D1(t0), D2(t0), D3(t0)
...
t9: d9(t7), D3(t0)
t10: D3(t0)
t11: d10(t10), d11(t10)
t12: d11(t10)

For the second example, the chains look like this:

t1: D0(t0)
t2: d1(t1), d2(t1), D0'(t1)
t3: d2(t1), D0'(t1)
t4: D0'(t1)
t5: d1(t4), d2(t4), D0''(t4)
...
t148: D0<50 primes>(t147)
t149: d101(t148), d102(t148)
t150: d102(t148)
<done>

This actually sucks more for pinning the log tail (we try to roll to t10
while holding an intent item that was logged in t1) but we've solved
problem #1.  We've also reduced the maximum chain length from:

    sum(all the new items) + nr_original_items

to:

    max(new items that each original item creates) + nr_original_items

This solves problem #3 by sharply reducing the number of defer ops that
can be attached to a transaction at any given time.  The change makes
the problem of log tail pinning worse, but is improvement we need to
solve problem #2.  Actually solving #2, however, is left to the next
patch.

Note that a subsequent analysis of some hard-to-trigger reflink and COW
livelocks on extremely fragmented filesystems (or systems running a lot
of IO threads) showed the same symptoms -- uncomfortably large numbers
of incore deferred work items and occasional stalls in the transaction
grant code while waiting for log reservations.  I think this patch and
the next one will also solve these problems.

As originally written, the code used list_splice_tail_init instead of
list_splice_init, so change that, and leave a short comment explaining
our actions.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sat, 26 Sep 2020 00:39:51 +0000 (17:39 -0700)]

xfs: fix an incore inode UAF in xfs_bui_recover

In xfs_bui_item_recover, there exists a use-after-free bug with regards
to the inode that is involved in the bmap replay operation.  If the
mapping operation does not complete, we call xfs_bmap_unmap_extent to
create a deferred op to finish the unmapping work, and we retain a
pointer to the incore inode.

Unfortunately, the very next thing we do is commit the transaction and
drop the inode.  If reclaim tears down the inode before we try to finish
the defer ops, we dereference garbage and blow up.  Therefore, create a
way to join inodes to the defer ops freezer so that we can maintain the
xfs_inode reference until we're done with the inode.

Note: This imposes the requirement that there be enough memory to keep
every incore inode in memory throughout recovery.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sat, 26 Sep 2020 00:39:50 +0000 (17:39 -0700)]

xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering

In most places in XFS, we have a specific order in which we gather
resources: grab the inode, allocate a transaction, then lock the inode.
xfs_bui_item_recover doesn't do it in that order, so fix it to be more
consistent. This also makes the error bailout code a bit less weird.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sat, 26 Sep 2020 00:39:50 +0000 (17:39 -0700)]

xfs: clean up bmap intent item recovery checking

The bmap intent item checking code in xfs_bui_item_recover is spread all
over the function. We should check the recovered log item at the top
before we allocate any resources or do anything else, so do that.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sat, 26 Sep 2020 00:39:50 +0000 (17:39 -0700)]

xfs: xfs_defer_capture should absorb remaining transaction reservation

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should record the transaction reservation type
from the old transaction so that when we continue the dfops chain, we
still use the same reservation parameters.

Doing this means that the log item recovery functions get to determine
the transaction reservation instead of abusing tr_itruncate in yet
another part of xfs.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sat, 26 Sep 2020 00:39:49 +0000 (17:39 -0700)]

xfs: xfs_defer_capture should absorb remaining block reservations

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should record the remaining block reservations so
that when we continue the dfops chain, we can reserve the same number of
blocks to use. We capture the reservations for both data and realtime
volumes.

This adds the requirement that every log intent item recovery function
must be careful to reserve enough blocks to handle both itself and all
defer ops that it can queue. On the other hand, this enables us to do
away with the handwaving block estimation nonsense that was going on in
xlog_finish_defer_ops.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sat, 26 Sep 2020 00:39:37 +0000 (17:39 -0700)]

xfs: proper replay of deferred ops queued during log recovery

When we replay unfinished intent items that have been recovered from the
log, it's possible that the replay will cause the creation of more
deferred work items.  As outlined in commit 509955823cc9c ("xfs: log
recovery should replay deferred ops in order"), later work items have an
implicit ordering dependency on earlier work items.  Therefore, recovery
must replay the items (both recovered and created) in the same order
that they would have been during normal operation.

For log recovery, we enforce this ordering by using an empty transaction
to collect deferred ops that get created in the process of recovering a
log intent item to prevent them from being committed before the rest of
the recovered intent items.  After we finish committing all the
recovered log items, we allocate a transaction with an enormous block
reservation, splice our huge list of created deferred ops into that
transaction, and commit it, thereby finishing all those ops.

This is /really/ hokey -- it's the one place in XFS where we allow
nested transactions; the splicing of the defer ops list is is inelegant
and has to be done twice per recovery function; and the broken way we
handle inode pointers and block reservations cause subtle use-after-free
and allocator problems that will be fixed by this patch and the two
patches after it.

Therefore, replace the hokey empty transaction with a structure designed
to capture each chain of deferred ops that are created as part of
recovering a single unfinished log intent.  Finally, refactor the loop
that replays those chains to do so using one transaction per chain.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 28 Sep 2020 18:01:45 +0000 (11:01 -0700)]

xfs: remove XFS_LI_RECOVERED

The ->iop_recover method of a log intent item removes the recovered
intent item from the AIL by logging an intent done item and committing
the transaction, so it's superfluous to have this flag check. Nothing
else uses it, so get rid of the flag entirely.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sat, 26 Sep 2020 00:39:27 +0000 (17:39 -0700)]

xfs: remove xfs_defer_reset

Remove this one-line helper since the assert is trivially true in one
call site and the rest obscures a bitmask operation.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Dave Chinner [Wed, 30 Sep 2020 14:28:52 +0000 (07:28 -0700)]

xfs: fix finobt btree block recovery ordering

Nathan popped up on #xfs and pointed out that we fail to handle
finobt btree blocks in xlog_recover_get_buf_lsn(). This means they
always fall through the entire magic number matching code to "recover
immediately". Whilst most of the time this is the correct behaviour,
occasionally it will be incorrect and could potentially overwrite
more recent metadata because we don't check the LSN in the on disk
metadata at all.

This bug has been present since the finobt was first introduced, and
is a potential cause of the occasional xfs_iget_check_free_state()
failures we see that indicate that the inode btree state does not
match the on disk inode state.

Fixes: aafc3c246529 ("xfs: support the XFS_BTNUM_FINOBT free inode btree type")
Reported-by: Nathan Scott <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Pavel Reichl [Fri, 25 Sep 2020 18:11:37 +0000 (11:11 -0700)]

xfs: remove deprecated sysctl options

These optionr were for Irix compatibility, probably for clustered XFS
clients in a heterogenous cluster which contained both Irix & Linux
machines, so that behavior would be consistent. That doesn't exist anymore
and it's no longer needed.

Signed-off-by: Pavel Reichl <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
[darrick: actually state when the sysctls go away]
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Pavel Reichl [Fri, 25 Sep 2020 18:10:29 +0000 (11:10 -0700)]

xfs: remove deprecated mount options

ikeep/noikeep was a workaround for old DMAPI code which is no longer
relevant.

attr2/noattr2 - is for controlling upgrade behaviour from fixed attribute
fork sizes in the inode (attr1) and dynamic attribute fork sizes (attr2).
mkfs has defaulted to setting attr2 since 2007, hence just about every
XFS filesystem out there in production right now uses attr2.

Signed-off-by: Pavel Reichl <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
[darrick: fix minor typos]
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Kaixu Xia [Fri, 25 Sep 2020 18:10:20 +0000 (11:10 -0700)]

xfs: directly call xfs_generic_create() for ->create() and ->mkdir()

The current create and mkdir handlers both call the xfs_vn_mknod()
which is a wrapper routine around xfs_generic_create() function.
Actually the create and mkdir handlers can directly call
xfs_generic_create() function and reduce the call chain.

Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Thu, 24 Sep 2020 16:42:55 +0000 (09:42 -0700)]

xfs: avoid shared rmap operations for attr fork extents

During code review, I noticed that the rmap code uses the (slower)
shared mappings rmap functions for any extent of a reflinked file, even
if those extents are for the attr fork, which doesn't support sharing.
We can speed up rmap a tiny bit by optimizing out this case.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Chandan Babu R <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Gao Xiang [Wed, 23 Sep 2020 16:26:31 +0000 (09:26 -0700)]

xfs: drop the obsolete comment on filestream locking

Since commit 1c1c6ebcf52 ("xfs: Replace per-ag array with a radix
tree"), there is no m_peraglock anymore, so it's hard to understand
the described situation since per-ag is no longer an array and no
need to reallocate, call xfs_filestream_flush() in growfs.

In addition, the race condition for shrink feature is quite confusing
to me currently as well. Get rid of it instead.

Signed-off-by: Gao Xiang <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Kaixu Xia [Wed, 23 Sep 2020 16:14:55 +0000 (09:14 -0700)]

xfs: code cleanup in xfs_attr_leaf_entsize_{remote,local}

Cleanup the typedef usage, the unnecessary parentheses, the unnecessary
backslash and use the open-coded round_up call in
xfs_attr_leaf_entsize_{remote,local}.

Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Kaixu Xia [Wed, 23 Sep 2020 16:13:28 +0000 (09:13 -0700)]

xfs: do the assert for all the log done items in xfs_trans_cancel

We should do the assert for all the log intent-done items if they appear
here. This patch detect intent-done items by the fact that their item ops
don't have iop_unpin and iop_push methods and also move the helper
xlog_item_is_intent to xfs_trans.h.

Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Kaixu Xia [Tue, 22 Sep 2020 16:19:18 +0000 (09:19 -0700)]

xfs: remove the unused parameter id from xfs_qm_dqattach_one

Since we never use the second parameter id, so remove it from
xfs_qm_dqattach_one() function.

Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Kaixu Xia [Thu, 17 Sep 2020 18:12:42 +0000 (11:12 -0700)]

xfs: remove the redundant crc feature check in xfs_attr3_rmt_verify

We already check whether the crc feature is enabled before calling
xfs_attr3_rmt_verify(), so remove the redundant feature check in that
function.

Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Kaixu Xia [Wed, 16 Sep 2020 21:31:56 +0000 (14:31 -0700)]

xfs: fix some comments

Fix the comments to help people understand the code.

Signed-off-by: Kaixu Xia <[email protected]>
[darrick: fix the indenting problems too]
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Kaixu Xia [Wed, 16 Sep 2020 21:31:55 +0000 (14:31 -0700)]

xfs: remove the unnecessary xfs_dqid_t type cast

Since the type prid_t and xfs_dqid_t both are uint32_t, seems the
type cast is unnecessary, so remove it.

Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Kaixu Xia [Wed, 16 Sep 2020 21:31:55 +0000 (14:31 -0700)]

xfs: use the existing type definition for di_projid

We have already defined the project ID type prid_t, so maybe should
use it here.

Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Kaixu Xia [Wed, 16 Sep 2020 21:31:54 +0000 (14:31 -0700)]

xfs: remove the unused SYNCHRONIZE macro

There are no callers of the SYNCHRONIZE() macro, so remove it.

Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Gao Xiang [Tue, 22 Sep 2020 16:41:06 +0000 (09:41 -0700)]

xfs: clean up calculation of LR header blocks

Let's use DIV_ROUND_UP() to calculate log record header
blocks as what did in xlog_get_iclog_buffer_size() and
wrap up a common helper for log recovery.

Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Gao Xiang [Tue, 22 Sep 2020 16:41:06 +0000 (09:41 -0700)]

xfs: avoid LR buffer overrun due to crafted h_len

Currently, crafted h_len has been blocked for the log
header of the tail block in commit a70f9fe52daa ("xfs:
detect and handle invalid iclog size set by mkfs").

However, each log record could still have crafted h_len
and cause log record buffer overrun. So let's check
h_len vs buffer size for each log record as well.

Signed-off-by: Gao Xiang <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 21 Sep 2020 16:15:10 +0000 (09:15 -0700)]

xfs: don't release log intent items when recovery fails

Nowadays, log recovery will call ->release on the recovered intent items
if recovery fails. Therefore, it's redundant to release them from
inside the ->recover functions when they're about to return an error.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 21 Sep 2020 16:15:10 +0000 (09:15 -0700)]

xfs: attach inode to dquot in xfs_bui_item_recover

In the bmap intent item recovery code, we must be careful to attach the
inode to its dquots (if quotas are enabled) so that a change in the
shape of the bmap btree doesn't cause the quota counters to be
incorrect.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 21 Sep 2020 16:15:09 +0000 (09:15 -0700)]

xfs: log new intent items created as part of finishing recovered intent items

During a code inspection, I found a serious bug in the log intent item
recovery code when an intent item cannot complete all the work and
decides to requeue itself to get that done.  When this happens, the
item recovery creates a new incore deferred op representing the
remaining work and attaches it to the transaction that it allocated.  At
the end of _item_recover, it moves the entire chain of deferred ops to
the dummy parent_tp that xlog_recover_process_intents passed to it, but
fail to log a new intent item for the remaining work before committing
the transaction for the single unit of work.

xlog_finish_defer_ops logs those new intent items once recovery has
finished dealing with the intent items that it recovered, but this isn't
sufficient.  If the log is forced to disk after a recovered log item
decides to requeue itself and the system goes down before we call
xlog_finish_defer_ops, the second log recovery will never see the new
intent item and therefore has no idea that there was more work to do.
It will finish recovery leaving the filesystem in a corrupted state.

The same logic applies to /any/ deferred ops added during intent item
recovery, not just the one handling the remaining work.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 21 Sep 2020 16:15:09 +0000 (09:15 -0700)]

xfs: check dabtree node hash values when loading child blocks

When xchk_da_btree_block is loading a non-root dabtree block, we know
that the parent block had to have a (hashval, address) pointer to the
block that we just loaded. Check that the hashval in the parent matches
the block we just loaded.

This was found by fuzzing nbtree[3].hashval = ones in xfs/394.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 21 Sep 2020 16:15:08 +0000 (09:15 -0700)]

xfs: don't free rt blocks when we're doing a REMAP bunmapi call

When callers pass XFS_BMAPI_REMAP into xfs_bunmapi, they want the extent
to be unmapped from the given file fork without the extent being freed.
We do this for non-rt files, but we forgot to do this for realtime
files. So far this isn't a big deal since nobody makes a bunmapi call
to a rt file with the REMAP flag set, but don't leave a logic bomb.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Chandan Babu R [Thu, 17 Sep 2020 18:12:08 +0000 (11:12 -0700)]

xfs: Set xfs_buf's b_ops member when zeroing bitmap/summary files

In xfs_growfs_rt(), we enlarge bitmap and summary files by allocating
new blocks for both files. For each of the new blocks allocated, we
allocate an xfs_buf, zero the payload, log the contents and commit the
transaction. Hence these buffers will eventually find themselves
appended to list at xfs_ail->ail_buf_list.

Later, xfs_growfs_rt() loops across all of the new blocks belonging to
the bitmap inode to set the bitmap values to 1. In doing so, it
allocates a new transaction and invokes the following sequence of
functions,
  - xfs_rtfree_range()
    - xfs_rtmodify_range()
      - xfs_rtbuf_get()
        We pass '&xfs_rtbuf_ops' as the ops pointer to xfs_trans_read_buf().
        - xfs_trans_read_buf()
  We find the xfs_buf of interest in per-ag hash table, invoke
  xfs_buf_reverify() which ends up assigning '&xfs_rtbuf_ops' to
  xfs_buf->b_ops.

On the other hand, if xfs_growfs_rt_alloc() had allocated a few blocks
for the bitmap inode and returned with an error, all the xfs_bufs
corresponding to the new bitmap blocks that have been allocated would
continue to be on xfs_ail->ail_buf_list list without ever having a
non-NULL value assigned to their b_ops members. An AIL flush operation
would then trigger the following warning message to be printed on the
console,

  XFS (loop0): _xfs_buf_ioapply: no buf ops on daddr 0x58 len 8
  00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  CPU: 3 PID: 449 Comm: xfsaild/loop0 Not tainted 5.8.0-rc4-chandan-00038-g4d8c2b9de9ab-dirty #37
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  Call Trace:
   dump_stack+0x57/0x70
   _xfs_buf_ioapply+0x37c/0x3b0
   ? xfs_rw_bdev+0x1e0/0x1e0
   ? xfs_buf_delwri_submit_buffers+0xd4/0x210
   __xfs_buf_submit+0x6d/0x1f0
   xfs_buf_delwri_submit_buffers+0xd4/0x210
   xfsaild+0x2c8/0x9e0
   ? __switch_to_asm+0x42/0x70
   ? xfs_trans_ail_cursor_first+0x80/0x80
   kthread+0xfe/0x140
   ? kthread_park+0x90/0x90
   ret_from_fork+0x22/0x30

This message indicates that the xfs_buf had its b_ops member set to
NULL.

This commit fixes the issue by assigning "&xfs_rtbuf_ops" to b_ops
member of each of the xfs_bufs logged by xfs_growfs_rt_alloc().

Signed-off-by: Chandan Babu R <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Chandan Babu R [Wed, 16 Sep 2020 03:50:42 +0000 (20:50 -0700)]

xfs: Set xfs_buf type flag when growing summary/bitmap files

The following sequence of commands,

  mkfs.xfs -f -m reflink=0 -r rtdev=/dev/loop1,size=10M /dev/loop0
  mount -o rtdev=/dev/loop1 /dev/loop0 /mnt
  xfs_growfs  /mnt

... causes the following call trace to be printed on the console,

XFS: Assertion failed: (bip->bli_flags & XFS_BLI_STALE) || (xfs_blft_from_flags(&bip->__bli_format) > XFS_BLFT_UNKNOWN_BUF && xfs_blft_from_flags(&bip->__bli_format) < XFS_BLFT_MAX_BUF), file: fs/xfs/xfs_buf_item.c, line: 331
Call Trace:
xfs_buf_item_format+0x632/0x680
? kmem_alloc_large+0x29/0x90
? kmem_alloc+0x70/0x120
? xfs_log_commit_cil+0x132/0x940
xfs_log_commit_cil+0x26f/0x940
? xfs_buf_item_init+0x1ad/0x240
? xfs_growfs_rt_alloc+0x1fc/0x280
__xfs_trans_commit+0xac/0x370
xfs_growfs_rt_alloc+0x1fc/0x280
xfs_growfs_rt+0x1a0/0x5e0
xfs_file_ioctl+0x3fd/0xc70
? selinux_file_ioctl+0x174/0x220
ksys_ioctl+0x87/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x3e/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9

This occurs because the buffer being formatted has the value of
XFS_BLFT_UNKNOWN_BUF assigned to the 'type' subfield of
bip->bli_formats->blf_flags.

This commit fixes the issue by assigning one of XFS_BLFT_RTSUMMARY_BUF
and XFS_BLFT_RTBITMAP_BUF to the 'type' subfield of
bip->bli_formats->blf_flags before committing the corresponding
transaction.

Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Brian Foster [Wed, 16 Sep 2020 03:44:46 +0000 (20:44 -0700)]

xfs: drop extra transaction roll from inode extent truncate

The inode extent truncate path unmaps extents from the inode block
mapping, finishes deferred ops to free the associated extents and
then explicitly rolls the transaction before processing the next
extent. The latter extent roll is spurious as xfs_defer_finish()
always returns a clean transaction and automatically relogs inodes
attached to the transaction (with lock_flags == 0). This can
unnecessarily increase the number of log ticket regrants that occur
during a long running truncate operation. Remove the explicit
transaction roll.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Thu, 10 Sep 2020 17:57:17 +0000 (10:57 -0700)]

xfs: deprecate the V4 format

The V4 filesystem format contains known weaknesses in the on-disk format
that make metadata verification diffiult.  In addition, the format does
not support dates past 2038 and will not be upgraded to do so.  We
should start the process of retiring the old format to close off attack
surfaces and to encourage users to migrate onto V5.

Therefore, make XFS V4 support a configurable option.  For the first
period it will be default Y in case some distributors want to withdraw
support early; for the second period it will be default N so that anyone
who wishes to continue support can do so; and after that, support will
be removed from the kernel.  Dates for these events have been added to
the upstream kernel.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Eric Sandeen <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sun, 13 Sep 2020 17:16:41 +0000 (10:16 -0700)]

xfs: don't propagate RTINHERIT -> REALTIME when there is no rtdev

While running generic/042 with -drtinherit=1 set in MKFS_OPTIONS, I
observed that the kernel will gladly set the realtime flag on any file
created on the loopback filesystem even though that filesystem doesn't
actually have a realtime device attached. This leads to verifier
failures and doesn't make any sense, so be smarter about this.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Sun, 13 Sep 2020 17:16:40 +0000 (10:16 -0700)]

xfs: refactor inode flags propagation code

Hoist the code that propagates di_flags and di_flags2 from a parent to a
new child into separate functions. No functional changes.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Wed, 9 Sep 2020 21:21:06 +0000 (14:21 -0700)]

xfs: ensure that fpunch, fcollapse, and finsert operations are aligned to rt extent size

Make sure that any fallocate operation that requires the range to be
block-aligned also checks that the range is aligned to the realtime
extent size.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Wed, 9 Sep 2020 21:21:06 +0000 (14:21 -0700)]

xfs: make sure the rt allocator doesn't run off the end

There's an overflow bug in the realtime allocator. If the rt volume is
large enough to handle a single allocation request that is larger than
the maximum bmap extent length and the rt bitmap ends exactly on a
bitmap block boundary, it's possible that the near allocator will try to
check the freeness of a range that extends past the end of the bitmap.
This fails with a corruption error and shuts down the fs.

Therefore, constrain maxlen so that the range scan cannot run off the
end of the rt bitmap.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Zheng Bin [Wed, 9 Sep 2020 16:29:16 +0000 (09:29 -0700)]

xfs: Remove unneeded semicolon

Fixes coccicheck warning:

fs/xfs/xfs_icache.c:1214:2-3: Unneeded semicolon

Signed-off-by: Zheng Bin <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Fri, 4 Sep 2020 17:20:16 +0000 (10:20 -0700)]

xfs: force the log after remapping a synchronous-writes file

Commit 5833112df7e9 tried to make it so that a remap operation would
force the log out to disk if the filesystem is mounted with mandatory
synchronous writes. Unfortunately, that commit failed to handle the
case where the inode or the file descriptor require mandatory
synchronous writes.

Refactor the check into into a helper that will look for all three
conditions, and now we can treat reflink just like any other synchronous
write.

Fixes: 5833112df7e9 ("xfs: reflink should force the log out if mounted with wsync")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

commit | commitdiff | tree

Carlos Maiolino [Mon, 7 Sep 2020 15:08:50 +0000 (08:08 -0700)]

xfs: Convert xfs_attr_sf macros to inline functions

xfs_attr_sf_totsize() requires access to xfs_inode structure, so, once
xfs_attr_shortform_addname() is its only user, move it to xfs_attr.c
instead of playing with more #includes.

Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Carlos Maiolino [Fri, 4 Sep 2020 16:51:39 +0000 (09:51 -0700)]

xfs: Use variable-size array for nameval in xfs_attr_sf_entry

nameval is a variable-size array, so, define it as it, and remove all
the -1 magic number subtractions

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Carlos Maiolino [Fri, 4 Sep 2020 16:51:31 +0000 (09:51 -0700)]

xfs: Remove typedef xfs_attr_shortform_t

Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Carlos Maiolino [Fri, 4 Sep 2020 16:51:30 +0000 (09:51 -0700)]

xfs: remove typedef xfs_attr_sf_entry_t

Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Carlos Maiolino [Tue, 1 Sep 2020 18:47:12 +0000 (11:47 -0700)]

xfs: Remove kmem_zalloc_large()

This patch aims to replace kmem_zalloc_large() with global kernel memory
API. So, all its callers are now using kvzalloc() directly, so kmalloc()
fallsback to vmalloc() automatically.

Signed-off-by: Carlos Maiolino <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 17 Aug 2020 17:00:01 +0000 (10:00 -0700)]

xfs: enable big timestamps

Enable the big timestamp feature.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Allison Collins <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 24 Aug 2020 18:58:01 +0000 (11:58 -0700)]

xfs: trace timestamp limits

Add a couple of tracepoints so that we can check the timestamp limits
being set on inodes and quotas.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Allison Collins <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 17 Aug 2020 16:59:51 +0000 (09:59 -0700)]

xfs: widen ondisk quota expiration timestamps to handle y2038+

Enable the bigtime feature for quota timers. We decrease the accuracy
of the timers to ~4s in exchange for being able to set timers up to the
bigtime maximum.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Allison Collins <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 17 Aug 2020 16:59:07 +0000 (09:59 -0700)]

xfs: widen ondisk inode timestamps to deal with y2038+

Redesign the ondisk inode timestamps to be a simple unsigned 64-bit
counter of nanoseconds since 14 Dec 1901 (i.e. the minimum time in the
32-bit unix time epoch). This enables us to handle dates up to 2486,
which solves the y2038 problem.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 24 Aug 2020 23:01:34 +0000 (16:01 -0700)]

xfs: redefine xfs_ictimestamp_t

Redefine xfs_ictimestamp_t as a uint64_t typedef in preparation for the
bigtime functionality. Preserve the legacy structure format so that we
can let the compiler take care of the masking and shifting.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 24 Aug 2020 22:15:46 +0000 (15:15 -0700)]

xfs: redefine xfs_timestamp_t

Redefine xfs_timestamp_t as a __be64 typedef in preparation for the
bigtime functionality. Preserve the legacy structure format so that we
can let the compiler take care of masking and shifting.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 17 Aug 2020 16:58:43 +0000 (09:58 -0700)]

xfs: move xfs_log_dinode_to_disk to the log recovery code

Move this function to xfs_inode_item_recover.c since there's only one
caller of it.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Allison Collins <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 17 Aug 2020 21:08:23 +0000 (14:08 -0700)]

xfs: refactor quota timestamp coding

Refactor quota timestamp encoding and decoding into helper functions so
that we can add extra behavior in the next patch.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Allison Collins <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 17 Aug 2020 16:58:42 +0000 (09:58 -0700)]

xfs: refactor default quota grace period setting code

Refactor the code that sets the default quota grace period into a helper
function so that we can override the ondisk behavior later.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Allison Collins <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 17 Aug 2020 16:58:36 +0000 (09:58 -0700)]

xfs: refactor quota expiration timer modification

Define explicit limits on the range of quota grace period expiration
timeouts and refactor the code that modifies the timeouts into helpers
that clamp the values appropriately. Note that we'll refactor the
default grace period timer separately.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Allison Collins <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 17 Aug 2020 16:58:02 +0000 (09:58 -0700)]

xfs: explicitly define inode timestamp range

Formally define the inode timestamp ranges that existing filesystems
support, and switch the vfs timetamp ranges to use it.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Allison Collins <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 17 Aug 2020 16:58:01 +0000 (09:58 -0700)]

xfs: enable new inode btree counters feature

Enable the new inode btree counters feature.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Wed, 26 Aug 2020 17:48:50 +0000 (10:48 -0700)]

xfs: support inode btree blockcounts in online repair

Add the necessary bits to the online repair code to support logging the
inode btree counters when rebuilding the btrees, and to support fixing
the counters when rebuilding the AGI.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Wed, 26 Aug 2020 17:48:50 +0000 (10:48 -0700)]

xfs: support inode btree blockcounts in online scrub

Add the necessary bits to the online scrub code to check the inode btree
counters when enabled.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Wed, 26 Aug 2020 17:54:27 +0000 (10:54 -0700)]

xfs: use the finobt block counts to speed up mount times

Now that we have reliable finobt block counts, use them to speed up the
per-AG block reservation calculations at mount time.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Darrick J. Wong [Mon, 17 Aug 2020 16:58:01 +0000 (09:58 -0700)]

xfs: store inode btree block counts in AGI header

Add a btree block usage counters for both inode btrees to the AGI header
so that we don't have to walk the entire finobt at mount time to create
the per-AG reservations.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:47 +0000 (10:55 -0700)]

xfs: reuse _xfs_buf_read for re-reading the superblock

Instead of poking deeply into buffer cache internals when re-reading the
superblock during log recovery just generalize _xfs_buf_read and use it
there. Note that we don't have to explicitly set up the ops as they
must be set from the initial read.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:47 +0000 (10:55 -0700)]

xfs: remove xfs_getsb

Merge xfs_getsb into its only caller, and clean that one up a little bit
as well.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:47 +0000 (10:55 -0700)]

xfs: simplify xfs_trans_getsb

Remove the mp argument as this function is only called in transaction
context, and open code xfs_getsb given that the function already accesses
the buffer pointer in the mount point directly.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:46 +0000 (10:55 -0700)]

xfs: remove xlog_recover_iodone

The log recovery I/O completion handler does not substancially differ from
the normal one except for the fact that it:

a) never retries failed writes
b) can have log items that aren't on the AIL
c) never has inode/dquot log items attached and thus don't need to
handle them

Add conditionals for (a) and (b) to the ioend code, while (c) doesn't
need special handling anyway.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:46 +0000 (10:55 -0700)]

xfs: clear the read/write flags later in xfs_buf_ioend

Clear the flags at the end of xfs_buf_ioend so that they can be used
during the completion.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:46 +0000 (10:55 -0700)]

xfs: use xfs_buf_item_relse in xfs_buf_item_done

Reuse xfs_buf_item_relse instead of duplicating it.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:45 +0000 (10:55 -0700)]

xfs: simplify the xfs_buf_ioend_disposition calling convention

Now that all the actual error handling is in a single place,
xfs_buf_ioend_disposition just needs to return true if took ownership of
the buffer, or false if not instead of the tristate. Also move the
error check back in the caller to optimize for the fast path, and give
the function a better fitting name.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:45 +0000 (10:55 -0700)]

xfs: lift the XBF_IOEND_FAIL handling into xfs_buf_ioend_disposition

Keep all the error handling code together.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:45 +0000 (10:55 -0700)]

xfs: remove xfs_buf_ioerror_retry

Merge xfs_buf_ioerror_retry into its only caller to make the resubmission
flow a little easier to follow.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:44 +0000 (10:55 -0700)]

xfs: refactor xfs_buf_ioerror_fail_without_retry

xfs_buf_ioerror_fail_without_retry is a somewhat weird function in
that it has two trivial checks that decide the return value, while
the rest implements a ratelimited warning. Just lift the two checks
into the caller, and give the remainder a suitable name.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:44 +0000 (10:55 -0700)]

xfs: fold xfs_buf_ioend_finish into xfs_ioend

No need to keep a separate helper for this logic.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:29 +0000 (10:55 -0700)]

xfs: move the buffer retry logic to xfs_buf.c

Move the buffer retry state machine logic to xfs_buf.c and call it once
from xfs_ioend instead of duplicating it three times for the three kinds
of buffers.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:20 +0000 (10:55 -0700)]

xfs: refactor xfs_buf_ioend

Move the log recovery I/O completion handling entirely into the log
recovery code, and re-arrange the normal I/O completion handler flow
to prepare to lifting more logic into common code in the next commits.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:20 +0000 (10:55 -0700)]

xfs: mark xfs_buf_ioend static

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Tue, 1 Sep 2020 17:55:20 +0000 (10:55 -0700)]

xfs: refactor the buf ioend disposition code

Handle the no-error case in xfs_buf_iodone_error as well, and to clarify
the code rename the function, use the actual enum type as return value
and then switch on it in the callers.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Dave Chinner [Mon, 17 Aug 2020 23:41:01 +0000 (16:41 -0700)]

xfs: xfs_iflock is no longer a completion

With the recent rework of the inode cluster flushing, we no longer
ever wait on the the inode flush "lock". It was never a lock in the
first place, just a completion to allow callers to wait for inode IO
to complete. We now never wait for flush completion as all inode
flushing is non-blocking. Hence we can get rid of all the iflock
infrastructure and instead just set and check a state flag.

Rename the XFS_IFLOCK flag to XFS_IFLUSHING, convert all the
xfs_iflock_nowait() test-and-set operations on that flag, and
replace all the xfs_ifunlock() calls to clear operations.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Carlos Maiolino [Wed, 26 Aug 2020 21:05:56 +0000 (14:05 -0700)]

xfs: remove kmem_realloc()

Remove kmem_realloc() function and convert its users to use MM API
directly (krealloc())

Signed-off-by: Carlos Maiolino <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Mon, 7 Sep 2020 00:11:40 +0000 (17:11 -0700)]

Linux 5.9-rc4

commit | commitdiff | tree

Linus Torvalds [Sun, 6 Sep 2020 19:10:27 +0000 (12:10 -0700)]

Merge tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block

Pull more io_uring fixes from Jens Axboe:
"Two followup fixes. One is fixing a regression from this merge window,
  the other is two commits fixing cancelation of deferred requests.

  Both have gone through full testing, and both spawned a few new
  regression test additions to liburing.

   - Don't play games with const, properly store the output iovec and
     assign it as needed.

   - Deferred request cancelation fix (Pavel)"

* tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block:
  io_uring: fix linked deferred ->files cancellation
  io_uring: fix cancel of deferred reqs with ->files
  io_uring: fix explicit async read/write mapping for large segments

commit | commitdiff | tree

Linus Torvalds [Sun, 6 Sep 2020 18:58:15 +0000 (11:58 -0700)]

Merge tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

- three Intel VT-d fixes to fix address handling on 32bit, fix a NULL
   pointer dereference bug and serialize a hardware register access as
   required by the VT-d spec.

- two patches for AMD IOMMU to force AMD GPUs into translation mode
   when memory encryption is active and disallow using IOMMUv2
   functionality.  This makes the AMDGPU driver work when memory
   encryption is active.

- two more fixes for AMD IOMMU to fix updating the Interrupt Remapping
   Table Entries.

- MAINTAINERS file update for the Qualcom IOMMU driver.

* tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Handle 36bit addressing for x86-32
  iommu/amd: Do not use IOMMUv2 functionality when SME is active
  iommu/amd: Do not force direct mapping when SME is active
  iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE
  iommu/amd: Restore IRTE.RemapEn bit after programming IRTE
  iommu/vt-d: Fix NULL pointer dereference in dev_iommu_priv_set()
  iommu/vt-d: Serialize IOMMU GCMD register modifications
  MAINTAINERS: Update QUALCOMM IOMMU after Arm SMMU drivers move

commit | commitdiff | tree

Linus Torvalds [Sun, 6 Sep 2020 17:28:00 +0000 (10:28 -0700)]

Merge tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

- more generic entry code ABI fallout

- debug register handling bugfixes

- fix vmalloc mappings on 32-bit kernels

- kprobes instrumentation output fix on 32-bit kernels

- fix over-eager WARN_ON_ONCE() on !SMAP hardware

- NUMA debugging fix

- fix Clang related crash on !RETPOLINE kernels

* tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Unbreak 32bit fast syscall
  x86/debug: Allow a single level of #DB recursion
  x86/entry: Fix AC assertion
  tracing/kprobes, x86/ptrace: Fix regs argument order for i386
  x86, fakenuma: Fix invalid starting node ID
  x86/mm/32: Bring back vmalloc faulting on x86_32
  x86/cmdline: Disable jump tables for cmdline.c

commit | commitdiff | tree

Linus Torvalds [Sun, 6 Sep 2020 16:59:27 +0000 (09:59 -0700)]

Merge tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
"A small series for fixing a problem with Xen PVH guests when running
  as backends (e.g. as dom0).

  Mapping other guests' memory is now working via ZONE_DEVICE, thus not
  requiring to abuse the memory hotplug functionality for that purpose"

* tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: add helpers to allocate unpopulated memory
  memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC
  xen/balloon: add header guard

commit | commitdiff | tree

Pavel Begunkov [Sat, 5 Sep 2020 21:45:15 +0000 (00:45 +0300)]

io_uring: fix linked deferred ->files cancellation

While looking for ->files in ->defer_list, consider that requests there
may actually be links.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Pavel Begunkov [Sat, 5 Sep 2020 21:45:14 +0000 (00:45 +0300)]

io_uring: fix cancel of deferred reqs with ->files

While trying to cancel requests with ->files, it also should look for
requests in ->defer_list, otherwise it might end up hanging a thread.

Cancel all requests in ->defer_list up to the last request there with
matching ->files, that's needed to follow drain ordering semantics.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Sat, 5 Sep 2020 21:22:46 +0000 (14:22 -0700)]

Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clang-format-for-linus-v5.9-rc4' and 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux

Pull misc fixes from Miguel Ojeda:
"A trivial patch for auxdisplay:

   - Replace HTTP links with HTTPS ones (Alexander A. Klimov)

  The usual clang-format trivial update:

   - Update with the latest for_each macro list (Miguel Ojeda)

  And Luc requested me to pick a sparse fix on my queue, so here it goes
  along with other two trivial Compiler Attributes ones (also from Luc).

   - sparse: use static inline for __chk_{user,io}_ptr() (Luc Van
     Oostenryck)

   - Compiler Attributes: fix comment concerning GCC 4.6 (Luc Van
     Oostenryck)

   - Compiler Attributes: remove comment about sparse not supporting
     __has_attribute (Luc Van Oostenryck)"

* tag 'auxdisplay-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
  auxdisplay: Replace HTTP links with HTTPS ones

* tag 'clang-format-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
  clang-format: Update with the latest for_each macro list

* tag 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
  sparse: use static inline for __chk_{user,io}_ptr()
  Compiler Attributes: fix comment concerning GCC 4.6
  Compiler Attributes: remove comment about sparse not supporting __has_attribute

commit | commitdiff | tree

Linus Torvalds [Sat, 5 Sep 2020 20:46:14 +0000 (13:46 -0700)]

Merge tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

- HSDK-4xd Dev system: perf driver updates for sampling interrupt

- HSDK* Dev System: Ethernet broken [Evgeniy Didin]

- HIGHMEM broken (2 memory banks) [Mike Rapoport]

- show_regs() rewrite once and for all

- Other minor fixes

* tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [plat-hsdk]: Switch ethernet phy-mode to rgmii-id
  arc: fix memory initialization for systems with two memory banks
  irqchip/eznps: Fix build error for !ARC700 builds
  ARC: show_regs: fix r12 printing and simplify
  ARC: HSDK: wireup perf irq
  ARC: perf: don't bail setup if pct irq missing in device-tree
  ARC: pgalloc.h: delete a duplicated word + other fixes

commit | commitdiff | tree

Linus Torvalds [Sat, 5 Sep 2020 20:28:40 +0000 (13:28 -0700)]

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"19 patches.

  Subsystems affected by this patch series: MAINTAINERS, ipc, fork,
  checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration,
  hugetlb)"

* emailed patches from Andrew Morton <[email protected]>:
  include/linux/log2.h: add missing () around n in roundup_pow_of_two()
  mm/khugepaged.c: fix khugepaged's request size in collapse_file
  mm/hugetlb: fix a race between hugetlb sysctl handlers
  mm/hugetlb: try preferred node first when alloc gigantic page from cma
  mm/migrate: preserve soft dirty in remove_migration_pte()
  mm/migrate: remove unnecessary is_zone_device_page() check
  mm/rmap: fixup copying of soft dirty and uffd ptes
  mm/migrate: fixup setting UFFD_WP flag
  mm: madvise: fix vma user-after-free
  checkpatch: fix the usage of capture group ( ... )
  fork: adjust sysctl_max_threads definition to match prototype
  ipc: adjust proc_ipc_sem_dointvec definition to match prototype
  mm: track page table modifications in __apply_to_page_range()
  MAINTAINERS: IA64: mark Status as Odd Fixes only
  MAINTAINERS: add LLVM maintainers
  MAINTAINERS: update Cavium/Marvell entries
  mm: slub: fix conversion of freelist_corrupted()
  mm: memcg: fix memcg reclaim soft lockup
  memcg: fix use-after-free in uncharge_batch

commit | commitdiff | tree

Jason Gunthorpe [Fri, 4 Sep 2020 23:36:19 +0000 (16:36 -0700)]

include/linux/log2.h: add missing () around n in roundup_pow_of_two()

Otherwise gcc generates warnings if the expression is complicated.

Fixes: 312a0c170945 ("[PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant")
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

Empty description

This page took 0.119225 seconds and 4 git commands to generate.