Qu Wenruo [Mon, 4 May 2020 23:58:20 +0000 (07:58 +0800)]
btrfs: block-group: refactor how we read one block group item
Structure btrfs_block_group has the following members which are
currently read from on-disk block group item and key:
- length - from item key
- used
- flags - from block group item
However for incoming skinny block group tree, we are going to read those
members from different sources.
This patch will refactor such read by:
- Don't initialize btrfs_block_group::length at allocation
Caller should initialize them manually.
Also to avoid possible (well, only two callers) missing
initialization, add extra ASSERT() in btrfs_add_block_group_cache().
- Refactor length/used/flags initialization into one function
The new function, fill_one_block_group() will handle the
initialization of such members.
- Use btrfs_block_group::length to replace key::offset
Since skinny block group item would have a different meaning for its
key offset.
Whenever a chown is executed, all capabilities of the file being touched
are lost. When doing incremental send with a file with capabilities,
there is a situation where the capability can be lost on the receiving
side. The sequence of actions bellow shows the problem:
At this point, only a chown was emitted by "btrfs send" since only the
group was changed. This makes the cap_sys_nice capability to be dropped
from fs2/snap_incremental/foo.bar
To fix that, only emit capabilities after chown is emitted. The current
code first checks for xattrs that are new/changed, emits them, and later
emit the chown. Now, __process_new_xattr skips capabilities, letting
only finish_inode_if_needed to emit them, if they exist, for the inode
being processed.
This behavior was being worked around in "btrfs receive" side by caching
the capability and only applying it after chown. Now, xattrs are only
emmited _after_ chown, making that workaround not needed anymore.
Filipe Manana [Fri, 8 May 2020 10:02:07 +0000 (11:02 +0100)]
btrfs: scrub, only lookup for csums if we are dealing with a data extent
When scrubbing a stripe, whenever we find an extent we lookup for its
checksums in the checksum tree. However we do it even for metadata extents
which don't have checksum items stored in the checksum tree, that is
only for data extents.
So make the lookup for checksums only if we are processing with a data
extent.
Filipe Manana [Fri, 8 May 2020 10:01:59 +0000 (11:01 +0100)]
btrfs: move the block group freeze/unfreeze helpers into block-group.c
The helpers btrfs_freeze_block_group() and btrfs_unfreeze_block_group()
used to be named btrfs_get_block_group_trimming() and
btrfs_put_block_group_trimming() respectively.
At the time they were added to free-space-cache.c, by commit e33e17ee1098
("btrfs: add missing discards when unpinning extents with -o discard")
because all the trimming related functions were in free-space-cache.c.
Now that the helpers were renamed and are used in scrub context as well,
move them to block-group.c, a much more logical location for them.
Filipe Manana [Fri, 8 May 2020 10:01:47 +0000 (11:01 +0100)]
btrfs: rename member 'trimming' of block group to a more generic name
Back in 2014, commit 04216820fe83d5 ("Btrfs: fix race between fs trimming
and block group remove/allocation"), I added the 'trimming' member to the
block group structure. Its purpose was to prevent races between trimming
and block group deletion/allocation by pinning the block group in a way
that prevents its logical address and device extents from being reused
while trimming is in progress for a block group, so that if another task
deletes the block group and then another task allocates a new block group
that gets the same logical address and device extents while the trimming
task is still in progress.
After the previous fix for scrub (patch "btrfs: fix a race between scrub
and block group removal/allocation"), scrub now also has the same needs that
trimming has, so the member name 'trimming' no longer makes sense.
Since there is already a 'pinned' member in the block group that refers
to space reservations (pinned bytes), rename the member to 'frozen',
add a comment on top of it to describe its general purpose and rename
the helpers to increment and decrement the counter as well, to match
the new member name.
The next patch in the series will move the helpers into a more suitable
file (from free-space-cache.c to block-group.c).
Filipe Manana [Fri, 8 May 2020 10:01:10 +0000 (11:01 +0100)]
btrfs: fix a race between scrub and block group removal/allocation
When scrub is verifying the extents of a block group for a device, it is
possible that the corresponding block group gets removed and its logical
address and device extents get used for a new block group allocation.
When this happens scrub incorrectly reports that errors were detected
and, if the the new block group has a different profile then the old one,
deleted block group, we can crash due to a null pointer dereference.
Possibly other unexpected and weird consequences can happen as well.
Consider the following sequence of actions that leads to the null pointer
dereference crash when scrub is running in parallel with balance:
1) Balance sets block group X to read-only mode and starts relocating it.
Block group X is a metadata block group, has a raid1 profile (two
device extents, each one in a different device) and a logical address
of 19424870400;
2) Scrub is running and finds device extent E, which belongs to block
group X. It enters scrub_stripe() to find all extents allocated to
block group X, the search is done using the extent tree;
3) Balance finishes relocating block group X and removes block group X;
4) Balance starts relocating another block group and when trying to
commit the current transaction as part of the preparation step
(prepare_to_relocate()), it blocks because scrub is running;
5) The scrub task finds the metadata extent at the logical address 19425001472 and marks the pages of the extent to be read by a bio
(struct scrub_bio). The extent item's flags, which have the bit
BTRFS_EXTENT_FLAG_TREE_BLOCK set, are added to each page (struct
scrub_page). It is these flags in the scrub pages that tells the
bio's end io function (scrub_bio_end_io_worker) which type of extent
it is dealing with. At this point we end up with 4 pages in a bio
which is ready for submission (the metadata extent has a size of
16Kb, so that gives 4 pages on x86);
6) At the next iteration of scrub_stripe(), scrub checks that there is a
pause request from the relocation task trying to commit a transaction,
therefore it submits the pending bio and pauses, waiting for the
transaction commit to complete before resuming;
7) The relocation task commits the transaction. The device extent E, that
was used by our block group X, is now available for allocation, since
the commit root for the device tree was swapped by the transaction
commit;
8) Another task doing a direct IO write allocates a new data block group Y
which ends using device extent E. This new block group Y also ends up
getting the same logical address that block group X had: 19424870400.
This happens because block group X was the block group with the highest
logical address and, when allocating Y, find_next_chunk() returns the
end offset of the current last block group to be used as the logical
address for the new block group, which is
So our new block group Y has the same logical address and device extent
that block group X had. However Y is a data block group, while X was
a metadata one, and Y has a raid0 profile, while X had a raid1 profile;
9) After allocating block group Y, the direct IO submits a bio to write
to device extent E;
10) The read bio submitted by scrub reads the 4 pages (16Kb) from device
extent E, which now correspond to the data written by the task that
did a direct IO write. Then at the end io function associated with
the bio, scrub_bio_end_io_worker(), we call scrub_block_complete()
which calls scrub_checksum(). This later function checks the flags
of the first page, and sees that the bit BTRFS_EXTENT_FLAG_TREE_BLOCK
is set in the flags, so it assumes it has a metadata extent and
then calls scrub_checksum_tree_block(). That functions returns an
error, since interpreting data as a metadata extent causes the
checksum verification to fail.
So this makes scrub_checksum() call scrub_handle_errored_block(),
which determines 'failed_mirror_index' to be 1, since the device
extent E was allocated as the second mirror of block group X.
It allocates BTRFS_MAX_MIRRORS scrub_block structures as an array at
'sblocks_for_recheck', and all the memory is initialized to zeroes by
kcalloc().
After that it calls scrub_setup_recheck_block(), which is responsible
for filling each of those structures. However, when that function
calls btrfs_map_sblock() against the logical address of the metadata
extent, 19425001472, it gets a struct btrfs_bio ('bbio') that matches
the current block group Y. However block group Y has a raid0 profile
and not a raid1 profile like X had, so the following call returns 1:
scrub_nr_raid_mirrors(bbio)
And as a result scrub_setup_recheck_block() only initializes the
first (index 0) scrub_block structure in 'sblocks_for_recheck'.
Then scrub_recheck_block() is called by scrub_handle_errored_block()
with the second (index 1) scrub_block structure as the argument,
because 'failed_mirror_index' was previously set to 1.
This scrub_block was not initialized by scrub_setup_recheck_block(),
so it has zero pages, its 'page_count' member is 0 and its 'pagev'
page array has all members pointing to NULL.
Finally when scrub_recheck_block() calls scrub_recheck_block_checksum()
we have a NULL pointer dereference when accessing the flags of the first
page, as pavev[0] is NULL:
This issue has been around for a long time, possibly since scrub exists.
The last time I ran into it was over 2 years ago. After recently fixing
fstests to pass the "--full-balance" command line option to btrfs-progs
when doing balance, several tests started to more heavily exercise balance
with fsstress, scrub and other operations in parallel, and therefore
started to hit this issue again (with btrfs/061 for example).
Fix this by having scrub increment the 'trimming' counter of the block
group, which pins the block group in such a way that it guarantees neither
its logical address nor device extents can be reused by future block group
allocations until we decrement the 'trimming' counter. Also make sure that
on each iteration of scrub_stripe() we stop scrubbing the block group if
it was removed already.
A later patch in the series will rename the block group's 'trimming'
counter and its helpers to a more generic name, since now it is not used
exclusively for pinning while trimming anymore.
David Sterba [Mon, 11 May 2020 12:49:10 +0000 (14:49 +0200)]
btrfs: remove more obsolete v0 extent ref declarations
The extent references v0 have been superseded long time go, there are
some unused declarations of access helpers. We can safely remove them
now. The struct btrfs_extent_ref_v0 is not used anywhere, but struct
btrfs_extent_item_v0 is still part of a backward compatibility check in
relocation.c and thus not removed.
[CAUSE]
The problem is in btrfs_qgroup_inherit(), we don't have good enough
check to determine if the new relation would break the existing
accounting.
Unlike btrfs_add_qgroup_relation(), which has proper check to determine
if we can do quick update without a rescan, in btrfs_qgroup_inherit() we
can even assign a snapshot to multiple qgroups.
[FIX]
Fix it by manually marking qgroup inconsistent for snapshot inheritance.
For subvolume creation, since all its extents are exclusively owned, we
don't need to rescan.
In theory, we should call relation check like quick_update_accounting()
when doing qgroup inheritance and inform user about qgroup accounting
inconsistency.
But we don't have good mechanism to relay that back to the user in the
snapshot creation context, thus we can only silently mark the qgroup
inconsistent.
Anyway, user shouldn't use qgroup inheritance during snapshot creation,
and should add qgroup relationship after snapshot creation by 'btrfs
qgroup assign', which has a much better UI to inform user about qgroup
inconsistent and kick in rescan automatically.
Robbie Ko [Thu, 7 May 2020 02:54:40 +0000 (10:54 +0800)]
btrfs: speedup dead root detection during orphan cleanup
When mounting, we handle deleted subvolume and orphan items. First,
find add orphan roots, then add them to fs_root radix tree. Second, in
tree-root, process each orphan item, skip if it is dead root.
The original algorithm is based on the list of dead_roots, one by one to
visit and check whether the objectid is consistent, the time complexity
is O (n ^ 2). When processing 50000 deleted subvols, it takes about
120s.
Because btrfs_find_orphan_roots has already ran before us, and added
deleted subvol to fs_roots radix tree.
The fs root will only be removed from the fs_roots radix tree after the
cleaner process is started, and the cleaner will only start execution
after the mount is complete.
btrfs_orphan_cleanup can be called during the whole filesystem mount
lifetime, but only "tree root" will be used in this section of code, and
only mount time will be brought into tree root.
So we can quickly check whether the orphan item is dead root through the
fs_roots radix tree.
David Sterba [Tue, 28 Apr 2020 15:10:29 +0000 (17:10 +0200)]
btrfs: add more codes to decoder table
I've grepped logs for 'errno=.*unknown' and found -95, -117 and -122,
now added to the table. The wording is adjusted so it makes sense in
context of filesystem.
Anand Jain [Mon, 4 May 2020 18:58:26 +0000 (02:58 +0800)]
btrfs: free alien device after device add
When an old device has new fsid through 'btrfs device add -f <dev>' our
fs_devices list has an alien device in one of the fs_devices lists.
By having an alien device in fs_devices, we have two issues so far
1. missing device does not not show as missing in the userland
2. degraded mount will fail
Both issues are caused by the fact that there's an alien device in the
fs_devices list. (Alien means that it does not belong to the filesystem,
identified by fsid, or does not contain btrfs filesystem at all, eg. due
to overwrite).
A device can be scanned/added through the control device ioctls
SCAN_DEV, DEVICES_READY or by ADD_DEV.
And device coming through the control device is checked against the all
other devices in the lists, but this was not the case for ADD_DEV.
This patch fixes both issues above by removing the alien device.
Anand Jain [Mon, 4 May 2020 18:58:25 +0000 (02:58 +0800)]
btrfs: include non-missing as a qualifier for the latest_bdev
btrfs_free_extra_devids() updates fs_devices::latest_bdev to point to
the bdev with greatest device::generation number. For a typical-missing
device the generation number is zero so fs_devices::latest_bdev will
never point to it.
But if the missing device is due to alienation [1], then
device::generation is not zero and if it is greater or equal to the rest
of device generations in the list, then fs_devices::latest_bdev ends up
pointing to the missing device and reports the error like [2].
[1] We maintain devices of a fsid (as in fs_device::fsid) in the
fs_devices::devices list, a device is considered as an alien device
if its fsid does not match with the fs_device::fsid
Filipe Manana [Tue, 21 Apr 2020 10:25:31 +0000 (11:25 +0100)]
btrfs: remove useless check for copy_items() return value
At btrfs_log_prealloc_extents() we are checking if copy_items() returns a
value greater than 0. That used to happen in the past to signal the caller
that the path given to it was released and reused for other searches, but
as of commit 0e56315ca147b3 ("Btrfs: fix missing hole after hole punching
and fsync when using NO_HOLES"), the copy_items() function does not have
that behaviour anymore and always returns 0 or a negative value. So just
remove that check at btrfs_log_prealloc_extents(), which the previously
mentioned commit forgot to remove.
Currently, direct I/O has its own versions of bio_readpage_error() and
btrfs_check_repairable() (dio_read_error() and
btrfs_check_dio_repairable(), respectively). The main difference is that
the direct I/O version doesn't do read validation. The rework of direct
I/O repair makes it possible to do validation, so we can get rid of
btrfs_check_dio_repairable() and combine bio_readpage_error() and
dio_read_error() into a new helper, btrfs_submit_read_repair().
This was originally added in commit 8b110e393c5a ("Btrfs: implement
repair function when direct read fails") to avoid a deadlock. In that
commit, the direct I/O read endio executes on the endio_workers
workqueue, submits a repair bio, and waits for it to complete. The
repair bio endio must execute on a different workqueue, otherwise it
could block on the endio_workers workqueue becoming available, which
won't happen because the original endio is blocked on the repair bio.
As of the previous commit, the original endio doesn't wait for the
repair bio, so this separate workqueue is unnecessary.
Direct I/O read repair was originally implemented in commit 8b110e393c5a
("Btrfs: implement repair function when direct read fails"). This
implementation is unnecessarily complicated. There is major code
duplication between __btrfs_subio_endio_read() (checks checksums and
handles I/O errors for files with checksums),
__btrfs_correct_data_nocsum() (handles I/O errors for files without
checksums), btrfs_retry_endio() (checks checksums and handles I/O errors
for retries of files with checksums), and btrfs_retry_endio_nocsum()
(handles I/O errors for retries of files without checksum). If it sounds
like these should be one function, that's because they should.
Additionally, these functions are very hard to follow due to their
excessive use of goto.
This commit replaces the original implementation. After the previous
commit getting rid of orig_bio, we can reuse the same endio callback for
repair I/O and the original I/O, we just need to track the file offset
and original iterator in the repair bio. We can also unify the handling
of files with and without checksums and simplify the control flow. We
also no longer have to wait for each repair I/O to complete one by one.
In the worst case, there are _4_ layers of bios in the Btrfs direct I/O
path:
1. The bio created by the generic direct I/O code (dio_bio).
2. A clone of dio_bio we create in btrfs_submit_direct() to represent
the entire direct I/O range (orig_bio).
3. A partial clone of orig_bio limited to the size of a RAID stripe that
we create in btrfs_submit_direct_hook().
4. Clones of each of those split bios for each RAID stripe that we
create in btrfs_map_bio().
As of the previous commit, the second layer (orig_bio) is no longer
needed for anything: we can split dio_bio instead, and complete dio_bio
directly when all of the cloned bios complete. This lets us clean up a
bunch of cruft, including dip->subio_endio and dip->errors (we can use
dio_bio->bi_status instead). It also enables the next big cleanup of
direct I/O read repair.
btrfs: put direct I/O checksums in btrfs_dio_private instead of bio
The next commit will get rid of btrfs_dio_private->orig_bio. The only
thing we really need it for is containing all of the checksums, but we
can easily put the checksum array in btrfs_dio_private and have the
submitted bios reference the array. We can also look the checksums up
while we're setting up instead of the current awkward logic that looks
them up for orig_bio when the first split bio is submitted.
(Interestingly, btrfs_dio_private did contain the
checksums before commit 23ea8e5a0767 ("Btrfs: load checksum data once
when submitting a direct read io"), but it didn't look them up up
front.)
Since its introduction in commit 2fe6303e7cd0 ("Btrfs: split
bio_readpage_error into several functions"), btrfs_check_repairable()
has only been used from extent_io.c where it is defined.
Fix a couple of issues in the btrfs_lookup_bio_sums documentation:
* The bio doesn't need to be a btrfs_io_bio if dst was provided. Move
the declaration in the code to make that clear, too.
* dst must be large enough to hold nblocks * csum_size, not just
csum_size.
btrfs: don't do repair validation for checksum errors
The purpose of the validation step is to distinguish between good and
bad sectors in a failed multi-sector read. If a multi-sector read
succeeded but some of those sectors had checksum errors, we don't need
to validate anything; we know the sectors with bad checksums need to be
repaired.
Read repair does two things: it finds a good copy of data to return to
the reader, and it corrects the bad copy on disk. If a read of multiple
sectors has an I/O error, repair does an extra "validation" step that
issues a separate read for each sector. This allows us to find the exact
failing sectors and only rewrite those.
This heuristic is implemented in
bio_readpage_error()/btrfs_check_repairable() as:
failed_bio_pages = failed_bio->bi_iter.bi_size >> PAGE_SHIFT;
if (failed_bio_pages > 1)
do validation
However, at this point, bi_iter may have already been advanced. This
means that we'll skip the validation step and rewrite the entire failed
read.
Fix it by getting the actual size from the biovec (which we can do
because this is only called for non-cloned bios, although that will
change in a later commit).
Fixes: 8a2ee44a371c ("btrfs: look at bi_size for repair decisions") Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
btrfs: fix double __endio_write_update_ordered in direct I/O
In btrfs_submit_direct(), if we fail to allocate the btrfs_dio_private,
we complete the ordered extent range. However, we don't mark that the
range doesn't need to be cleaned up from btrfs_direct_IO() until later.
Therefore, if we fail to allocate the btrfs_dio_private, we complete the
ordered extent range twice. We could fix this by updating
unsubmitted_oe_range earlier, but it's cleaner to reorganize the code so
that creating the btrfs_dio_private and submitting the bios are
separate, and once the btrfs_dio_private is created, cleanup always
happens through the btrfs_dio_private.
The logic around unsubmitted_oe_range_end and unsubmitted_oe_range_start
is really subtle. We have the following:
1. btrfs_direct_IO sets those two to the same value.
2. When we call __blockdev_direct_IO unless
btrfs_get_blocks_direct->btrfs_get_blocks_direct_write is called to
modify unsubmitted_oe_range_start so that start < end. Cleanup
won't happen.
3. We come into btrfs_submit_direct - if it dip allocation fails we'd
return with oe_range_end now modified so cleanup will happen.
4. If we manage to allocate the dip we reset the unsubmitted range
members to be equal so that cleanup happens from
btrfs_endio_direct_write.
This 4-step logic is not really obvious, especially given it's scattered
across 3 functions.
Fixes: f28a49287817 ("Btrfs: fix leaking of ordered extents after direct IO write error") Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Omar Sandoval <[email protected]>
[ add range start/end logic explanation from Nikolay ] Signed-off-by: David Sterba <[email protected]>
btrfs: fix error handling when submitting direct I/O bio
In btrfs_submit_direct_hook(), if a direct I/O write doesn't span a RAID
stripe or chunk, we submit orig_bio without cloning it. In this case, we
don't increment pending_bios. Then, if btrfs_submit_dio_bio() fails, we
decrement pending_bios to -1, and we never complete orig_bio. Fix it by
initializing pending_bios to 1 instead of incrementing later.
Fixing this exposes another bug: we put orig_bio prematurely and then
put it again from end_io. Fix it by not putting orig_bio.
After this change, pending_bios is really more of a reference count, but
I'll leave that cleanup separate to keep the fix small.
An upcoming Btrfs fix needs to know the original size of a non-cloned
bios. Rather than accessing the bvec table directly, let's add a
bio_for_each_bvec_all() accessor.
Filipe Manana [Fri, 17 Apr 2020 15:36:50 +0000 (16:36 +0100)]
btrfs: simplify error handling of clean_pinned_extents()
At clean_pinned_extents(), whether we end up returning success or failure,
we pretty much have to do the same things:
1) unlock unused_bg_unpin_mutex
2) decrement reference count on the previous transaction
We also call btrfs_dec_block_group_ro() in case of failure, but that is
better done in its caller, btrfs_delete_unused_bgs(), since its the
caller that calls inc_block_group_ro(), so it should be responsible for
the decrement operation, as it is in case any of the other functions it
calls fail.
So move the call to btrfs_dec_block_group_ro() from clean_pinned_extents()
into btrfs_delete_unused_bgs() and unify the error and success return
paths for clean_pinned_extents(), reducing duplicated code and making it
simpler.
Nikolay Borisov [Wed, 15 Apr 2020 12:53:46 +0000 (15:53 +0300)]
btrfs: make btrfs_read_disk_super return struct btrfs_disk_super
Instead of returning both the page and the super block structure, make
btrfs_read_disk_super just return a pointer to struct btrfs_disk_super.
As a result the function signature is simplified. Also,
read_cache_page_gfp can never return NULL so check its return value only
for IS_ERR.
Nikolay Borisov [Fri, 21 Feb 2020 13:11:24 +0000 (15:11 +0200)]
btrfs: use list_for_each_entry_safe in free_reloc_roots
The function always works on a local copy of the reloc root list, which
cannot be modified outside of it so using list_for_each_entry is fine.
Additionally the macro handles empty lists so drop list_empty checks of
callers. No semantic changes.
David Sterba [Tue, 25 Feb 2020 14:05:53 +0000 (15:05 +0100)]
btrfs: don't force read-only after error in drop snapshot
Deleting a subvolume on a full filesystem leads to ENOSPC followed by a
forced read-only. This is not a transaction abort and the filesystem is
otherwise ok, so the error should be just propagated to the callers.
This is caused by unnecessary call to btrfs_handle_fs_error for all
errors, except EAGAIN. This does not make sense as the standard
transaction abort mechanism is in btrfs_drop_snapshot so all relevant
failures are handled.
Originally in commit cb1b69f4508a ("Btrfs: forced readonly when
btrfs_drop_snapshot() fails") there was no return value at all, so the
btrfs_std_error made some sense but once the error handling and
propagation has been implemented we don't need it anymore.
Filipe Manana [Tue, 7 Apr 2020 10:38:58 +0000 (11:38 +0100)]
btrfs: remove pointless assertion on reclaim_size counter
The reclaim_size counter of a space_info object is unsigned. So its value
can never be negative, it's pointless to have an assertion that checks
its value is >= 0, therefore remove it.
Zheng Wei [Mon, 16 Mar 2020 03:45:57 +0000 (11:45 +0800)]
btrfs: tree-checker: remove duplicate definition of 'inode_item_err'
Remove the duplicate definition of 'inode_item_err' in the file
tree-checker.c that got there by accident in c23c77b097dc ("btrfs:
tree-checker: Refactor inode key check into seperate function").
Josef Bacik [Fri, 13 Mar 2020 19:28:48 +0000 (15:28 -0400)]
btrfs: force chunk allocation if our global rsv is larger than metadata
Nikolay noticed a bunch of test failures with my global rsv steal
patches. At first he thought they were introduced by them, but they've
been failing for a while with 64k nodes.
The problem is with 64k nodes we have a global reserve that calculates
out to 13MiB on a freshly made file system, which only has 8MiB of
metadata space. Because of changes I previously made we no longer
account for the global reserve in the overcommit logic, which means we
correctly allow overcommit to happen even though we are already
overcommitted.
However in some corner cases, for example btrfs/170, we will allocate
the entire file system up with data chunks before we have enough space
pressure to allocate a metadata chunk. Then once the fs is full we
ENOSPC out because we cannot overcommit and the global reserve is taking
up all of the available space.
The most ideal way to deal with this is to change our space reservation
stuff to take into account the height of the tree's that we're
modifying, so that our global reserve calculation does not end up so
obscenely large.
However that is a huge undertaking. Instead fix this by forcing a chunk
allocation if the global reserve is larger than the total metadata
space. This gives us essentially the same behavior that happened
before, we get a chunk allocated and these tests can pass.
This is meant to be a stop-gap measure until we can tackle the "tree
height only" project.
Josef Bacik [Fri, 13 Mar 2020 19:58:09 +0000 (15:58 -0400)]
btrfs: run btrfs_try_granting_tickets if a priority ticket fails
With normal tickets we could have a large reservation at the front of
the list that is unable to be satisfied, but a smaller ticket later on
that can be satisfied. The way we handle this is to run
btrfs_try_granting_tickets() in maybe_fail_all_tickets().
However no such protection exists for priority tickets. Fix this by
handling it in handle_reserve_ticket(). If we've returned after
attempting to flush space in a priority related way, we'll still be on
the priority list and need to be removed.
We rely on the flushing to free up space and wake the ticket, but if
there is not enough space to reclaim _but_ there's enough space in the
space_info to handle subsequent reservations then we would have gotten
an ENOSPC erroneously.
Address this by catching where we are still on the list, meaning we were
a priority ticket, and removing ourselves and then running
btrfs_try_granting_tickets(). This will handle this particular corner
case.
Josef Bacik [Fri, 13 Mar 2020 19:58:08 +0000 (15:58 -0400)]
btrfs: only check priority tickets for priority flushing
In debugging a generic/320 failure on ppc64, Nikolay noticed that
sometimes we'd ENOSPC out with plenty of space to reclaim if we had
committed the transaction. He further discovered that this was because
there was a priority ticket that was small enough to fit in the free
space currently in the space_info.
Consider the following scenario. There is no more space to reclaim in
the fs without committing the transaction. Assume there's 1MiB of space
free in the space info, but there are pending normal tickets with 2MiB
reservations.
Now a priority ticket comes in with a .5MiB reservation. Because we
have normal tickets pending we add ourselves to the priority list,
despite the fact that we could satisfy this reservation.
The flushing machinery now gets to the point where it wants to commit
the transaction, but because there's a .5MiB ticket on the priority list
and we have 1MiB of free space we assume the ticket will be granted
soon, so we bail without committing the transaction.
Meanwhile the priority flushing does not commit the transaction, and
eventually fails with an ENOSPC. Then all other tickets are failed with
ENOSPC because we were never able to actually commit the transaction.
The fix for this is we should have simply granted the priority flusher
his reservation, because there was space to make the reservation.
Priority flushers by definition take priority, so they are allowed to
make their reservations before any previous normal tickets. By not
adding this priority ticket to the list the normal flushing mechanisms
will then commit the transaction and everything will continue normally.
We still need to serialize ourselves with other priority tickets, so if
there are any tickets on the priority list then we need to add ourselves
to that list in order to maintain the serialization between priority
tickets.
Josef Bacik [Fri, 13 Mar 2020 19:58:07 +0000 (15:58 -0400)]
btrfs: account for trans_block_rsv in may_commit_transaction
On ppc64le with 64k page size (respectively 64k block size) generic/320
was failing and debug output showed we were getting a premature ENOSPC
with a bunch of space in btrfs_fs_info::trans_block_rsv.
This meant there were still open transaction handles holding space, yet
the flusher didn't commit the transaction because it deemed the freed
space won't be enough to satisfy the current reserve ticket. Fix this
by accounting for space in trans_block_rsv when deciding whether the
current transaction should be committed or not.
Josef Bacik [Fri, 13 Mar 2020 19:58:06 +0000 (15:58 -0400)]
btrfs: allow to use up to 90% of the global block rsv for unlink
We previously had a limit of stealing 50% of the global reserve for
unlink. This was from a time when the global reserve was used for the
delayed refs as well. However now those reservations are kept separate,
so the global reserve can be depleted much more to allow us to make
progress for space restoring operations like unlink. Change the minimum
amount of space required to be left in the global reserve to 10%.
Josef Bacik [Fri, 13 Mar 2020 19:58:05 +0000 (15:58 -0400)]
btrfs: improve global reserve stealing logic
For unlink transactions and block group removal
btrfs_start_transaction_fallback_global_rsv will first try to start an
ordinary transaction and if it fails it will fall back to reserving the
required amount by stealing from the global reserve. This is problematic
because of all the same reasons we had with previous iterations of the
ENOSPC handling, thundering herd. We get a bunch of failures all at
once, everybody tries to allocate from the global reserve, some win and
some lose, we get an ENSOPC.
Fix this behavior by introducing BTRFS_RESERVE_FLUSH_ALL_STEAL. It's
used to mark unlink reservation. To fix this we need to integrate this
logic into the normal ENOSPC infrastructure. We still go through all of
the normal flushing work, and at the moment we begin to fail all the
tickets we try to satisfy any tickets that are allowed to steal by
stealing from the global reserve. If this works we start the flushing
system over again just like we would with a normal ticket satisfaction.
This serializes our global reserve stealing, so we don't have the
thundering herd problem.
Qu Wenruo [Mon, 16 Mar 2020 06:10:01 +0000 (14:10 +0800)]
btrfs: backref: distinguish reloc and non-reloc use of indirect resolution
For relocation tree detection, relocation backref cache uses
btrfs_should_ignore_reloc_root() which uses relocation-specific checks
like checking the DEAD_RELOC_ROOT bit.
However for general purpose backref cache, we can rely on that check, as
it's possible that relocation is also running.
For generic purposed backref cache, we detect reloc root by
SHARED_BLOCK_REF item. Only reloc root node has its parent bytenr
pointing back to itself.
And in that case, backref cache will mark the reloc root node useless,
dropping any child orphan nodes.
So only call btrfs_should_ignore_reloc_root() if the backref cache is
for relocation.
Qu Wenruo [Tue, 3 Mar 2020 06:26:02 +0000 (14:26 +0800)]
btrfs: backref: rename and move should_ignore_root()
This function is mostly single purpose to relocation backref cache, but
since we're moving the main part of backref cache to backref.c, we need
to export such function.
And to avoid confusion, rename the function to
btrfs_should_ignore_reloc_root() make the name a little more clear.
Qu Wenruo [Thu, 26 Mar 2020 06:11:09 +0000 (14:11 +0800)]
btrfs: rename tree_entry to rb_simple_node and export it
Structure tree_entry provides a very simple rb_tree which only uses
bytenr as search index.
That tree_entry is used in 3 structures: backref_node, mapping_node and
tree_block.
Since we're going to make backref_node independnt from relocation, it's
a good time to extract the tree_entry into rb_simple_node, and export it
into misc.h.
Qu Wenruo [Mon, 24 Feb 2020 01:19:02 +0000 (09:19 +0800)]
btrfs: reloc: use wrapper to replace open-coded edge linking
Since backref_edge is used to connect upper and lower backref nodes, and
needs to access both nodes, some code can look pretty nasty:
list_add_tail(&edge->list[LOWER], &cur->upper);
The above code will link @cur to the LOWER side of the edge, while both
"LOWER" and "upper" words show up. This can sometimes be very confusing
for reader to grasp.
This patch introduces a new wrapper, link_backref_edge(), to handle the
linking behavior. Which also has extra ASSERT() to ensure caller won't
pass wrong nodes.
Also, this updates the comment of related lists of backref_node and
backref_edge, to make it more clear that each list points to what.
Qu Wenruo [Fri, 6 Mar 2020 06:04:12 +0000 (14:04 +0800)]
btrfs: reloc: make reloc root search-specific for relocation backref cache
find_reloc_root() searches reloc_control::reloc_root_tree to find the
reloc root. This behavior is only useful for relocation backref cache.
For the incoming more generic purpose backref cache, we don't care
about who owns the reloc root, but only care if it's a reloc root.
So this patch makes the following modifications to make the reloc root
search more specific to relocation backref:
- Add backref_node::is_reloc_root
This will be an extra indicator for generic purposed backref cache.
User doesn't need to read root key from backref_node::root to
determine if it's a reloc root.
Also for reloc tree root, it's useless and will be queued to useless
list.
- Add backref_cache::is_reloc
This will allow backref cache code to do different behavior for
generic purpose backref cache and relocation backref cache.
- Pass fs_info to find_reloc_root()
- Export find_reloc_root()
So backref.c can utilize this function.
Qu Wenruo [Thu, 20 Feb 2020 07:16:16 +0000 (15:16 +0800)]
btrfs: reloc: rename mark_block_processed and __mark_block_processed
These two functions are weirdly named, mark_block_processed() in fact
just marks a range dirty unconditionally, while __mark_block_processed()
does extra check before doing the marking.
This patch will open code old mark_block_processed, and rename
__mark_block_processed() to remove the "__" prefix.
Since we're here, also kill the forward declaration, which could also
kill in_block_group() with in_range() macro.
Qu Wenruo [Thu, 13 Feb 2020 06:11:04 +0000 (14:11 +0800)]
btrfs: backref: introduce the skeleton of btrfs_backref_iter
Due to the complex nature of btrfs extent tree, when we want to iterate
all backrefs of one extent, this involves quite a lot of work, like
searching the EXTENT_ITEM/METADATA_ITEM, iteration through inline and keyed
backrefs.
Normally this would result in a complex code, something like:
btrfs_search_slot()
/* Ensure we are at EXTENT_ITEM/METADATA_ITEM */
while (1) { /* Loop for extent tree items */
while (ptr < end) { /* Loop for inlined items */
/* Real work here */
}
next:
ret = btrfs_next_item()
/* Ensure we're still at keyed item for specified bytenr */
}
The idea of btrfs_backref_iter is to avoid such complex and hard to
read code structure, but something like the following:
iter = btrfs_backref_iter_alloc();
ret = btrfs_backref_iter_start(iter, bytenr);
if (ret < 0)
goto out;
for (; ; ret = btrfs_backref_iter_next(iter)) {
/* Real work here */
}
out:
btrfs_backref_iter_free(iter);
This patch is just the skeleton + btrfs_backref_iter_start() code.
Linus Torvalds [Sun, 24 May 2020 17:24:10 +0000 (10:24 -0700)]
Merge tag 'efi-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Thomas Gleixner:
"A set of EFI fixes:
- Don't return a garbage screen info when EFI framebuffer is not
available
- Make the early EFI console work properly with wider fonts instead
of drawing garbage
- Prevent a memory buffer leak in allocate_e820()
- Print the firmware error record properly so it can be decoded by
users
- Fix a symbol clash in the host tool build which only happens with
newer compilers.
- Add a missing check for the event log version of TPM which caused
boot failures on several Dell systems due to an attempt to decode
SHA-1 format with the crypto agile algorithm"
* tag 'efi-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tpm: check event log version before reading final events
efi: Pull up arch-specific prototype efi_systab_show_arch()
x86/boot: Mark global variables as static
efi: cper: Add support for printing Firmware Error Record Reference
efi/libstub/x86: Avoid EFI map buffer alloc in allocate_e820()
efi/earlycon: Fix early printk for wider fonts
efi/libstub: Avoid returning uninitialized data from setup_graphics()
Linus Torvalds [Sun, 24 May 2020 17:21:02 +0000 (10:21 -0700)]
Merge tag 'x86-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two fixes for x86:
- Unbreak stack dumps for inactive tasks by interpreting the special
first frame left by __switch_to_asm() correctly.
The recent change not to skip the first frame so ORC and frame
unwinder behave in the same way caused all entries to be
unreliable, i.e. prepended with '?'.
- Use cpumask_available() instead of an implicit NULL check of a
cpumask_var_t in mmio trace to prevent a Clang build warning"
* tag 'x86-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks
x86/mmiotrace: Use cpumask_available() for cpumask_var_t variables
Linus Torvalds [Sun, 24 May 2020 17:14:58 +0000 (10:14 -0700)]
Merge tag 'sched-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"A set of fixes for the scheduler:
- Fix handling of throttled parents in enqueue_task_fair() completely.
The recent fix overlooked a corner case where the first iteration
terminates due to an entity already being on the runqueue which
makes the list management incomplete and later triggers the
assertion which checks for completeness.
- Fix a similar problem in unthrottle_cfs_rq().
- Show the correct uclamp values in procfs which prints the effective
value twice instead of requested and effective"
* tag 'sched-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list
sched/debug: Fix requested task uclamp values shown in procfs
sched/fair: Fix enqueue_task_fair() warning some more
1) Fix RCU warnings in ipv6 multicast router code, from Madhuparna
Bhowmik.
2) Nexthop attributes aren't being checked properly because of
mis-initialized iterator, from David Ahern.
3) Revert iop_idents_reserve() change as it caused performance
regressions and was just working around what is really a UBSAN bug
in the compiler. From Yuqi Jin.
4) Read MAC address properly from ROM in bmac driver (double iteration
proceeds past end of address array), from Jeremy Kerr.
5) Add Microsoft Surface device IDs to r8152, from Marc Payne.
6) Prevent reference to freed SKB in __netif_receive_skb_core(), from
Boris Sukholitko.
7) Fix ACK discard behavior in rxrpc, from David Howells.
8) Preserve flow hash across packet scrubbing in wireguard, from Jason
A. Donenfeld.
9) Cap option length properly for SO_BINDTODEVICE in AX25, from Eric
Dumazet.
10) Fix encryption error checking in kTLS code, from Vadim Fedorenko.
11) Missing BPF prog ref release in flow dissector, from Jakub Sitnicki.
12) dst_cache must be used with BH disabled in tipc, from Eric Dumazet.
13) Fix use after free in mlxsw driver, from Jiri Pirko.
14) Order kTLS key destruction properly in mlx5 driver, from Tariq
Toukan.
15) Check devm_platform_ioremap_resource() return value properly in
several drivers, from Tiezhu Yang.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
net: smsc911x: Fix runtime PM imbalance on error
net/mlx4_core: fix a memory leak bug.
net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend
net: phy: mscc: fix initialization of the MACsec protocol mode
net: stmmac: don't attach interface until resume finishes
net: Fix return value about devm_platform_ioremap_resource()
net/mlx5: Fix error flow in case of function_setup failure
net/mlx5e: CT: Correctly get flow rule
net/mlx5e: Update netdev txq on completions during closure
net/mlx5: Annotate mutex destroy for root ns
net/mlx5: Don't maintain a case of del_sw_func being null
net/mlx5: Fix cleaning unmanaged flow tables
net/mlx5: Fix memory leak in mlx5_events_init
net/mlx5e: Fix inner tirs handling
net/mlx5e: kTLS, Destroy key object after destroying the TIS
net/mlx5e: Fix allowed tc redirect merged eswitch offload cases
net/mlx5: Avoid processing commands before cmdif is ready
net/mlx5: Fix a race when moving command interface to events mode
net/mlx5: Add command entry handling completion
rxrpc: Fix a memory leak in rxkad_verify_response()
...
David S. Miller [Sat, 23 May 2020 23:39:45 +0000 (16:39 -0700)]
Merge tag 'mlx5-fixes-2020-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2020-05-22
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.13
('net/mlx5: Add command entry handling completion')
For -stable v5.2
('net/mlx5: Fix error flow in case of function_setup failure')
('net/mlx5: Fix memory leak in mlx5_events_init')
For -stable v5.3
('net/mlx5e: Update netdev txq on completions during closure')
('net/mlx5e: kTLS, Destroy key object after destroying the TIS')
('net/mlx5e: Fix inner tirs handling')
For -stable v5.6
('net/mlx5: Fix cleaning unmanaged flow tables')
('net/mlx5: Fix a race when moving command interface to events mode')
====================
Qiushi Wu [Fri, 22 May 2020 19:07:15 +0000 (14:07 -0500)]
net/mlx4_core: fix a memory leak bug.
In function mlx4_opreq_action(), pointer "mailbox" is not released,
when mlx4_cmd_box() return and error, causing a memory leak bug.
Fix this issue by going to "out" label, mlx4_free_cmd_mailbox() can
free this pointer.
Fixes: fe6f700d6cbb ("net/mlx4_core: Respond to operation request by firmware") Signed-off-by: Qiushi Wu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend
vlan_for_each() are required to be called with rtnl_lock taken, otherwise
ASSERT_RTNL() warning will be triggered - which happens now during System
resume from suspend:
cpsw_suspend()
|- cpsw_ndo_stop()
|- __hw_addr_ref_unsync_dev()
|- cpsw_purge_all_mc()
|- vlan_for_each()
|- ASSERT_RTNL();
Hence, fix it by surrounding cpsw_ndo_stop() by rtnl_lock/unlock() calls.
Fixes: 15180eca569b ("net: ethernet: ti: cpsw: fix vlan mcast") Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Antoine Tenart [Fri, 22 May 2020 15:55:45 +0000 (17:55 +0200)]
net: phy: mscc: fix initialization of the MACsec protocol mode
At the very end of the MACsec block initialization in the MSCC PHY
driver, the MACsec "protocol mode" is set. This setting should be set
based on the PHY id within the package, as the bank used to access the
register used depends on this. This was not done correctly, and only the
first bank was used leading to the two upper PHYs being unstable when
using the VSC8584. This patch fixes it.
Fixes: 1bbe0ecc2a1a ("net: phy: mscc: macsec initialization") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Leon Yu [Fri, 22 May 2020 15:29:43 +0000 (23:29 +0800)]
net: stmmac: don't attach interface until resume finishes
Commit 14b41a2959fb ("net: stmmac: Delete txtimer in suspend") was the
first attempt to fix a race between mod_timer() and setup_timer()
during stmmac_resume(). However the issue still exists as the commit
only addressed half of the issue.
Same race can still happen as stmmac_resume() re-attaches interface
way too early - even before hardware is fully initialized. Worse,
doing so allows network traffic to restart and stmmac_tx_timer_arm()
being called in the middle of stmmac_resume(), which re-init tx timers
in stmmac_init_coalesce(). timer_list will be corrupted and system
crashes as a result of race between mod_timer() and setup_timer().
Mike Rapoport [Sat, 23 May 2020 19:57:18 +0000 (22:57 +0300)]
sparc32: fix page table traversal in srmmu_nocache_init()
The srmmu_nocache_init() uses __nocache_fix() macro to add an offset to
page table entry to access srmmu_nocache_pool.
But since sparc32 has only three actual page table levels, pgd, p4d and
pud are essentially the same thing and pgd_offset() and p4d_offset() are
no-ops, the __nocache_fix() should be done only at PUD level.
Remove __nocache_fix() for p4d_offset() and pud_offset() and keep it
only for PUD and lower levels.
Linus Torvalds [Sat, 23 May 2020 18:21:47 +0000 (11:21 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"11 fixes"
* emailed patches from Andrew Morton <[email protected]>:
MAINTAINERS: add files related to kdump
z3fold: fix use-after-free when freeing handles
sparc32: use PUD rather than PGD to get PMD in srmmu_nocache_init()
MAINTAINERS: update email address for Naoya Horiguchi
sh: include linux/time_types.h for sockios
kasan: disable branch tracing for core runtime
selftests/vm/write_to_hugetlbfs.c: fix unused variable warning
selftests/vm/.gitignore: add mremap_dontunmap
rapidio: fix an error in get_user_pages_fast() error handling
x86: bitops: fix build regression
device-dax: don't leak kernel memory to user space after unloading kmem
Linus Torvalds [Sat, 23 May 2020 18:06:13 +0000 (11:06 -0700)]
Merge tag 'driver-core-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"So, turns out the kobject fix didn't quite work, so here are four
patches that in the end, result in just two driver core fixes for
reported issues that no one has had problems with.
The kobject patch that was originally in here has now been reverted,
as Guenter reported boot problems with it on some of his systems"
* tag 'driver-core-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "kobject: Make sure the parent does not get released before its children"
kobject: Make sure the parent does not get released before its children
driver core: Fix handling of SYNC_STATE_ONLY + STATELESS device links
driver core: Fix SYNC_STATE_ONLY device link implementation
Linus Torvalds [Sat, 23 May 2020 18:02:42 +0000 (11:02 -0700)]
Merge tag 'char-misc-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are some small char/misc driver fixes for 5.7-rc7 that resolve
some reported issues. Included in here are tiny fixes for the mei,
coresight, rtsx, ipack, and mhi drivers.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc: rtsx: Add short delay after exit from ASPM
bus: mhi: core: Fix some error return code
ipack: tpci200: fix error return code in tpci200_register()
coresight: cti: remove incorrect NULL return check
mei: release me_cl object reference
Linus Torvalds [Sat, 23 May 2020 17:57:55 +0000 (10:57 -0700)]
Merge tag 'staging-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/iio fixes from Greg KH:
"Here are some small staging and IIO driver fixes for 5.7-rc7
Nothing major, just a collection of IIO driver fixes for reported
issues, and a few small staging driver fixes that people have found.
Full details are in the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: wfx: unlock on error path
staging: greybus: Fix uninitialized scalar variable
staging: kpc2000: fix error return code in kp2000_pcie_probe()
iio: sca3000: Remove an erroneous 'get_device()'
iio: adc: stm32-dfsdm: fix device used to request dma
iio: adc: stm32-adc: fix device used to request dma
iio: adc: ti-ads8344: Fix channel selection
staging: iio: ad2s1210: Fix SPI reading
iio: dac: vf610: Fix an error handling path in 'vf610_dac_probe()'
iio: imu: st_lsm6dsx: unlock on error in st_lsm6dsx_shub_write_raw()
iio: chemical: atlas-sensor: correct DO-SM channels
Linus Torvalds [Sat, 23 May 2020 17:50:38 +0000 (10:50 -0700)]
Merge tag 'tty-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fix from Greg KH:
"Here is a single serial driver fix for 5.7-rc7. It resolves an issue
with the SiFive serial console init sequence that was reported a
number of times.
It has been in linux-next for a while now with no reported issues"
* tag 'tty-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: add missing spin_lock_init for SiFive serial console
Linus Torvalds [Sat, 23 May 2020 17:42:12 +0000 (10:42 -0700)]
Merge tag 's390-5.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Add missing R_390_JMP_SLOT relocation type in KASLR code.
- Fix set_huge_pte_at for empty ptes issue which has been uncovered
with arch page table helper tests.
- Correct initrd location for kdump kernel.
- Fix s390_mmio_read/write with MIO in PCI code.
* tag 's390-5.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/kaslr: add support for R_390_JMP_SLOT relocation type
s390/mm: fix set_huge_pte_at() for empty ptes
s390/kexec_file: fix initrd location for kdump kernel
s390/pci: Fix s390_mmio_read/write with MIO