Git Repo - linux.git/log

]> Git Repo - linux.git/log

Nikolay Borisov [Thu, 30 Jan 2020 12:59:44 +0000 (14:59 +0200)]

btrfs: Implement DREW lock

A (D)ouble (R)eader (W)riter (E)xclustion lock is a locking primitive
that allows to have multiple readers or multiple writers but not
multiple readers and writers holding it concurrently.

The code is factored out from the existing open-coded locking scheme
used to exclude pending snapshots from nocow writers and vice-versa.
Current implementation actually favors Readers (that is snapshot
creaters) to writers (nocow writers of the filesystem).

The API provides lock/unlock/trylock for reads and writes.

Formal specification for TLA+ provided by Valentin Schneider is at
https://lore.kernel.org/linux-btrfs/2dcaf81c-f0d3-409e-cb29-733d8b3b4cc9@arm.com/

Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Johannes Thumshirn [Tue, 11 Feb 2020 15:10:22 +0000 (00:10 +0900)]

btrfs: simplify error handling in __btrfs_write_out_cache()

The error cleanup gotos in __btrfs_write_out_cache() needlessly jump
back making the code less readable then needed. Flatten them out so no
back-jump is necessary and the read flow is uninterrupted.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Johannes Thumshirn [Tue, 11 Feb 2020 15:10:21 +0000 (00:10 +0900)]

btrfs: use standard debug config option to enable free-space-cache debug prints

free-space-cache.c has it's own set of DEBUG ifdefs which need to be
turned on instead of the global CONFIG_BTRFS_DEBUG to print debug
messages about failed block-group writes.

Switch this over to CONFIG_BTRFS_DEBUG so we always see these messages
when running a debug kernel.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Johannes Thumshirn [Tue, 11 Feb 2020 15:10:20 +0000 (00:10 +0900)]

btrfs: make the uptodate argument of io_ctl_add_pages() boolean

Make the uptodate argument of io_ctl_add_pages() boolean.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Johannes Thumshirn [Tue, 11 Feb 2020 15:10:19 +0000 (00:10 +0900)]

btrfs: use inode from io_ctl in io_ctl_prepare_pages

io_ctl_prepare_pages() gets a 'struct btrfs_io_ctl' as well as a 'struct
inode', but btrfs_io_ctl::inode points to the same struct inode as this is
assgined in io_ctl_init().

Use the inode form io_ctl to reduce the arguments of io_ctl_prepare_pages.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Marcos Paulo de Souza [Fri, 7 Feb 2020 13:05:46 +0000 (10:05 -0300)]

btrfs: add new BTRFS_IOC_SNAP_DESTROY_V2 ioctl

This ioctl will be responsible for deleting a subvolume using its id.
This can be used when a system has a file system mounted from a
subvolume, rather than the root file system, like below:

/
@subvol1/
@subvol2/
@subvol_default/

If only @subvol_default is mounted, we have no path to reach @subvol1
and @subvol2, thus no way to delete them. Current subvolume delete ioctl
takes a file handle point as argument, and if @subvol_default is
mounted, we can't reach @subvol1 and @subvol2 from the same mount point.

This patch introduces a new ioctl BTRFS_IOC_SNAP_DESTROY_V2 that takes
the extended structure with flags to allow to delete subvolume using
subvolid.

Now, we can use this new ioctl specifying the subvolume id and refer to
the same mount point. It doesn't matter which subvolume was mounted,
since we can reach to the desired one using the subvolume id, and then
delete it.

The full path to the subvolume id is resolved internally and access is
verified as if the subvolume was accessed by path.

The volume args v2 structure is extended to use the existing union for
subvolume id specification, that's valid in case the
BTRFS_SUBVOL_SPEC_BY_ID is set.

Signed-off-by: Marcos Paulo de Souza <[email protected]>
Reviewed-by: David Sterba <[email protected]>
[ update changelog ]
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Marcos Paulo de Souza [Fri, 21 Feb 2020 13:56:12 +0000 (14:56 +0100)]

btrfs: export helpers for subvolume name/id resolution

The functions will be used outside of export.c and super.c to allow
resolving subvolume name from a given id, eg. for subvolume deletion by
id ioctl.

Signed-off-by: Marcos Paulo de Souza <[email protected]>
Reviewed-by: David Sterba <[email protected]>
[ split from the next patch ]
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Fri, 21 Feb 2020 12:30:14 +0000 (13:30 +0100)]

btrfs: use ioctl args support mask for device delete

When the device remove v2 ioctl was added, the full support mask was
added to sanity check the flags. However this would allow to let the
subvolume related flags to be accepted. This is not supposed to happen.

Use the correct support mask, which means that now any of
BTRFS_SUBVOL_CREATE_ASYNC, BTRFS_SUBVOL_RDONLY or
BTRFS_SUBVOL_QGROUP_INHERIT will be rejected as ENOTSUPP. Though this is
a user-visible change, specifying subvolume flags for device deletion
does not make sense and there are hopefully no applications doing that.

Reviewed-by: Marcos Paulo de Souza <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Fri, 21 Feb 2020 12:24:37 +0000 (13:24 +0100)]

btrfs: use ioctl args support mask for subvolume create/delete

Using the defined mask instead of flag enumeration in the ioctl handler
is preferred. No functional changes.

Reviewed-by: Marcos Paulo de Souza <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Fri, 21 Feb 2020 12:16:33 +0000 (13:16 +0100)]

btrfs: define support masks for ioctl volume args v2

The ioctl data for devices or subvolumes can be passed via
btrfs_ioctl_vol_args or btrfs_ioctl_vol_args_v2. The latter is more
versatile and needs some caution as some of the flags make sense only
for some ioctls.

As we're going to extend the flags, define support masks for each ioctl
class separately.

Reviewed-by: Marcos Paulo de Souza <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Jules Irenge [Sun, 23 Feb 2020 23:16:42 +0000 (23:16 +0000)]

btrfs: Add missing lock annotation for release_extent_buffer()

Sparse reports a warning at release_extent_buffer()
warning: context imbalance in release_extent_buffer() - unexpected unlock

The root cause is the missing annotation at release_extent_buffer()
Add the missing __releases(&eb->refs_lock) annotation

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Jules Irenge <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 14 Feb 2020 20:22:06 +0000 (15:22 -0500)]

btrfs: set update the uuid generation as soon as possible

In my EIO stress testing I noticed I was getting forced to rescan the
uuid tree pretty often, which was weird.  This is because my error
injection stuff would sometimes inject an error after log replay but
before we loaded the UUID tree.  If log replay committed the transaction
it wouldn't have updated the uuid tree generation, but the tree was
valid and didn't change, so there's no reason to not update the
generation here.

Fix this by setting the BTRFS_FS_UPDATE_UUID_TREE_GEN bit immediately
after reading all the fs roots if the uuid tree generation matches the
fs generation.  Then any transaction commits that happen during mount
won't screw up our uuid tree state, forcing us to do needless uuid
rescans.

Fixes: 70f801754728 ("Btrfs: check UUID tree during mount if required")
CC: [email protected] # 4.19+
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 14 Feb 2020 20:05:01 +0000 (15:05 -0500)]

btrfs: bail out of uuid tree scanning if we're closing

In doing my fsstress+EIO stress testing I started running into issues
where umount would get stuck forever because the uuid checker was
chewing through the thousands of subvolumes I had created.

We shouldn't block umount on this, simply bail if we're unmounting the
fs. We need to make sure we don't mark the UUID tree as ok, so we only
set that bit if we made it through the whole rescan operation, but
otherwise this is completely safe.

Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Tue, 18 Feb 2020 14:56:08 +0000 (16:56 +0200)]

btrfs: make btrfs_check_uuid_tree private to disk-io.c

It's used only during filesystem mount as such it can be made private to
disk-io.c file. Also use the occasion to move btrfs_uuid_rescan_kthread
as btrfs_check_uuid_tree is its sole caller.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Tue, 18 Feb 2020 14:56:07 +0000 (16:56 +0200)]

btrfs: call btrfs_check_uuid_tree_entry directly in btrfs_uuid_tree_iterate

btrfs_uuid_tree_iterate is called from only once place and its 2nd
argument is always btrfs_check_uuid_tree_entry. Simplify
btrfs_uuid_tree_iterate's signature by removing its 2nd argument and
directly calling btrfs_check_uuid_tree_entry. Also move the latter into
uuid-tree.h. No functional changes.

Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 19 Feb 2020 14:17:20 +0000 (15:17 +0100)]

btrfs: raid56: simplify tracking of Q stripe presence

There are temporary variables tracking the index of P and Q stripes, but
none of them is really used as such, merely for determining if the Q
stripe is present. This leads to compiler warnings with
-Wunused-but-set-variable and has been reported several times.

fs/btrfs/raid56.c: In function ‘finish_rmw’:
fs/btrfs/raid56.c:1199:6: warning: variable ‘p_stripe’ set but not used [-Wunused-but-set-variable]
1199 |  int p_stripe = -1;
      |      ^~~~~~~~
fs/btrfs/raid56.c: In function ‘finish_parity_scrub’:
fs/btrfs/raid56.c:2356:6: warning: variable ‘p_stripe’ set but not used [-Wunused-but-set-variable]
2356 |  int p_stripe = -1;
      |      ^~~~~~~~

Replace the two variables with one that has a clear meaning and also get
rid of the warnings. The logic that verifies that there are only 2
valid cases is unchanged.

Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

ethanwu [Fri, 7 Feb 2020 09:38:18 +0000 (17:38 +0800)]

btrfs: backref, use correct count to resolve normal data refs

With the following patches:

- btrfs: backref, only collect file extent items matching backref offset
- btrfs: backref, not adding refs from shared block when resolving normal backref
- btrfs: backref, only search backref entries from leaves of the same root

we only collect the normal data refs we want, so the imprecise upper
bound total_refs of that EXTENT_ITEM could now be changed to the count
of the normal backref entry we want to search.

Background and how the patches fit together:

Btrfs has two types of data backref.
For BTRFS_EXTENT_DATA_REF_KEY type of backref, we don't have the
exact block number. Therefore, we need to call resolve_indirect_refs.
It uses btrfs_search_slot to locate the leaf block. Then
we need to walk through the leaves to search for the EXTENT_DATA items
that have disk bytenr matching the extent item (add_all_parents).

When resolving indirect refs, we could take entries that don't
belong to the backref entry we are searching for right now.
For that reason when searching backref entry, we always use total
refs of that EXTENT_ITEM rather than individual count.

For example:
item 11 key (40831553536 EXTENT_ITEM 4194304) itemoff 15460 itemsize
  extent refs 24 gen 7302 flags DATA
  shared data backref parent 394985472 count 10 #1
  extent data backref root 257 objectid 260 offset 1048576 count 3 #2
  extent data backref root 256 objectid 260 offset 65536 count 6 #3
  extent data backref root 257 objectid 260 offset 65536 count 5 #4

For example, when searching backref entry #4, we'll use total_refs
24, a very loose loop ending condition, instead of total_refs = 5.

But using total_refs = 24 is not accurate. Sometimes, we'll never find
all the refs from specific root.  As a result, the loop keeps on going
until we reach the end of that inode.

The first 3 patches, handle 3 different types refs we might encounter.
These refs do not belong to the normal backref we are searching, and
hence need to be skipped.

This patch changes the total_refs to correct number so that we could
end loop as soon as we find all the refs we want.

btrfs send uses backref to find possible clone sources, the following
is a simple test to compare the results with and without this patch:

$ btrfs subvolume create /sub1
$ for i in `seq 1 163840`; do
     dd if=/dev/zero of=/sub1/file bs=64K count=1 seek=$((i-1)) conv=notrunc oflag=direct
   done
$ btrfs subvolume snapshot /sub1 /sub2
$ for i in `seq 1 163840`; do
     dd if=/dev/zero of=/sub1/file bs=4K count=1 seek=$(((i-1)*16+10)) conv=notrunc oflag=direct
   done
$ btrfs subvolume snapshot -r /sub1 /snap1
$ time btrfs send /snap1 | btrfs receive /volume2

Without this patch:

real 69m48.124s
user 0m50.199s
sys  70m15.600s

With this patch:

real    1m59.683s
user    0m35.421s
sys     2m42.684s

Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: ethanwu <[email protected]>
[ add patchset cover letter with background and numbers ]
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

ethanwu [Fri, 7 Feb 2020 09:38:17 +0000 (17:38 +0800)]

btrfs: backref, only search backref entries from leaves of the same root

We could have some nodes/leaves in subvolume whose owner are not the
that subvolume. In this way, when we resolve normal backrefs of that
subvolume, we should avoid collecting those references from these blocks.

Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: ethanwu <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

ethanwu [Fri, 7 Feb 2020 09:38:16 +0000 (17:38 +0800)]

btrfs: backref, don't add refs from shared block when resolving normal backref

All references from the block of SHARED_DATA_REF belong to that shared
block backref.

For example:

  item 11 key (40831553536 EXTENT_ITEM 4194304) itemoff 15460 itemsize 95
      extent refs 24 gen 7302 flags DATA
      extent data backref root 257 objectid 260 offset 65536 count 5
      extent data backref root 258 objectid 265 offset 0 count 9
      shared data backref parent 394985472 count 10

Block 394985472 might be leaf from root 257, and the item obejctid and
(file_pos - file_extent_item::offset) in that leaf just happens to be
260 and 65536 which is equal to the first extent data backref entry.

Before this patch, when we resolve backref:

  root 257 objectid 260 offset 65536

we will add those refs in block 394985472 and wrongly treat those as the
refs we want.

Fix this by checking if the leaf we are processing is shared data
backref, if so, just skip this leaf.

Shared data refs added into preftrees.direct have all entry value = 0
(root_id = 0, key = NULL, level = 0) except parent entry.

Other refs from indirect tree will have key value and root id != 0, and
these values won't be changed when their parent is resolved and added to
preftrees.direct. Therefore, we could reuse the preftrees.direct and
search ref with all values = 0 except parent is set to avoid getting
those resolved refs block.

Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: ethanwu <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

ethanwu [Fri, 7 Feb 2020 09:38:15 +0000 (17:38 +0800)]

btrfs: backref, only collect file extent items matching backref offset

When resolving one backref of type EXTENT_DATA_REF, we collect all
references that simply reference the EXTENT_ITEM even though their
(file_pos - file_extent_item::offset) are not the same as the
btrfs_extent_data_ref::offset we are searching for.

This patch adds additional check so that we only collect references whose
(file_pos - file_extent_item::offset) == btrfs_extent_data_ref::offset.

Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: ethanwu <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Johannes Thumshirn [Thu, 13 Feb 2020 15:24:36 +0000 (00:24 +0900)]

btrfs: remove buffer_heads form super block mirror integrity checking

The integrity checking code for the super block mirrors is the last
remaining user of buffer_heads, change it to using plain bios as well.

Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Johannes Thumshirn [Thu, 13 Feb 2020 15:24:35 +0000 (00:24 +0900)]

btrfs: remove buffer_heads from btrfsic_process_written_block()

Now that the last caller of btrfsic_process_written_block() with
buffer_heads is gone, remove the buffer_head processing path from it as
well.

Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Johannes Thumshirn [Thu, 13 Feb 2020 15:24:34 +0000 (00:24 +0900)]

btrfs: remove btrfsic_submit_bh()

Now that the last use of btrfsic_submit_bh() is gone as the super block
is now written using bios, remove the function as well.

Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Johannes Thumshirn [Thu, 13 Feb 2020 15:24:33 +0000 (00:24 +0900)]

btrfs: use bios instead of buffer_heads from super block writeout

Similar to the superblock read path, change the write path to using bios
and pages instead of buffer_heads. This allows us to skip over the
buffer_head code, for writing the superblock to disk.

This is based on a patch originally authored by Nikolay Borisov.

Co-developed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Johannes Thumshirn [Thu, 13 Feb 2020 15:24:32 +0000 (00:24 +0900)]

btrfs: use the page cache for super block reading

Super-block reading in BTRFS is done using buffer_heads. Buffer_heads
have some drawbacks, like not being able to propagate errors from the
lower layers.

Directly use the page cache for reading the super blocks from disk or
invalidating an on-disk super block. We have to use the page cache so to
avoid races between mkfs and udev. See also 6f60cbd3ae44 ("btrfs: access
superblock via pagecache in scan_one_device").

This patch unwraps the buffer head API and does not change the way the
super block is actually read.

Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Johannes Thumshirn [Thu, 13 Feb 2020 15:24:31 +0000 (00:24 +0900)]

btrfs: reduce scope of btrfs_scratch_superblocks()

btrfs_scratch_superblocks() isn't used anywhere outside volumes.c so
remove it from the header file and mark it as static. Also move it
above it's callers so we don't need a forward declaration.

Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Johannes Thumshirn [Thu, 13 Feb 2020 15:24:30 +0000 (00:24 +0900)]

btrfs: don't kmap() pages from block devices

Block device mappings are never in highmem so kmap() / kunmap() calls for
pages from block devices are unneeded. Use page_address() instead of
kmap() to get to the virtual addreses.

While we're at it, read_cache_page_gfp() doesn't return NULL on error,
only an ERR_PTR, so use IS_ERR() to check for errors.

Suggested-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Thu, 13 Feb 2020 15:24:29 +0000 (00:24 +0900)]

btrfs: Export btrfs_release_disk_super

Preparatory patch for removal of buffer_head usage in btrfs.

Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Filipe Manana [Thu, 13 Feb 2020 10:20:02 +0000 (10:20 +0000)]

Btrfs: avoid unnecessary splits when setting bits on an extent io tree

When attempting to set bits on a range of an exent io tree that already
has those bits set we can end up splitting an extent state record, use
the preallocated extent state record, insert it into the red black tree,
do another search on the red black tree, merge the preallocated extent
state record with the previous extent state record, remove that previous
record from the red black tree and then free it. This is all unnecessary
work that consumes time.

This happens specifically at the following case at __set_extent_bit():

  $ cat -n fs/btrfs/extent_io.c
   957  static int __must_check
   958  __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
  (...)
  1044          /*
  1045           *     | ---- desired range ---- |
  1046           * | state |
  1047           *   or
  1048           * | ------------- state -------------- |
  1049           *
  (...)
  1060          if (state->start < start) {
  1061                  if (state->state & exclusive_bits) {
  1062                          *failed_start = start;
  1063                          err = -EEXIST;
  1064                          goto out;
  1065                  }
  1066
  1067                  prealloc = alloc_extent_state_atomic(prealloc);
  1068                  BUG_ON(!prealloc);
  1069                  err = split_state(tree, state, prealloc, start);
  1070                  if (err)
  1071                          extent_io_tree_panic(tree, err);
  1072
  1073                  prealloc = NULL;

So if our extent state represents a range from 0 to 1MiB for example, and
we want to set bits in the range 128KiB to 256KiB for example, and that
extent state record already has all those bits set, we end up splitting
that record, so we end up with extent state records in the tree which
represent the ranges from 0 to 128KiB and from 128KiB to 1MiB. This is
temporary because a subsequent iteration in that function will end up
merging the records.

The splitting requires using the preallocated extent state record, so
a future iteration that needs to do another split will need to allocate
another extent state record in an atomic context, something not ideal
that we try to avoid as much as possible. The splitting also requires
an insertion in the red black tree, and a subsequent merge will require
a deletion from the red black tree and freeing an extent state record.

This change just skips the splitting of an extent state record when it
already has all the bits the we need to set.

Setting a bit that is already set for a range is very common in the
inode's 'file_extent_tree' extent io tree for example, where we keep
setting the EXTENT_DIRTY bit every time we replace an extent.

This change also fixes a bug that happens after the recent patchset from
Josef that avoids having implicit holes after a power failure when not
using the NO_HOLES feature, more specifically the patch with the subject:

  "btrfs: introduce the inode->file_extent_tree"

This patch introduced an extent io tree per inode to keep track of
completed ordered extents and figure out at any time what is the safe
value for the inode's disk_i_size. This assumes that for contiguous
ranges in a file we always end up with a single extent state record in
the io tree, but that is not the case, as there is a short time window
where we can have two extent state records representing contiguous
ranges. When this happens we end setting up an incorrect value for the
inode's disk_i_size, resulting in data loss after a clean unmount
of the filesystem. The following example explains how this can happen.

Suppose we have an inode with an i_size and a disk_i_size of 1MiB, so in
the inode's file_extent_tree we have a single extent state record that
represents the range [0, 1MiB) with the EXTENT_DIRTY bit set. Then the
following steps happen:

1) A buffered write against file range [512KiB, 768KiB) is made. At this
   point delalloc was not flushed yet;

2) Deduplication from some other inode into this inode's range
   [128KiB, 256KiB) is made. This causes btrfs_inode_set_file_extent_range()
   to be called, from btrfs_insert_clone_extent(), to mark the range
   [128KiB, 256KiB) with EXTENT_DIRTY in the inode's file_extent_tree;

3) When btrfs_inode_set_file_extent_range() calls set_extent_bits(), we
   end up at __set_extent_bit(). In the first iteration of that function's
   loop we end up in the following branch:

   $ cat -n fs/btrfs/extent_io.c
    957  static int __must_check
    958  __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
   (...)
   1044          /*
   1045           *     | ---- desired range ---- |
   1046           * | state |
   1047           *   or
   1048           * | ------------- state -------------- |
   1049           *
   (...)
   1060          if (state->start < start) {
   1061                  if (state->state & exclusive_bits) {
   1062                          *failed_start = start;
   1063                          err = -EEXIST;
   1064                          goto out;
   1065                  }
   1066
   1067                  prealloc = alloc_extent_state_atomic(prealloc);
   1068                  BUG_ON(!prealloc);
   1069                  err = split_state(tree, state, prealloc, start);
   1070                  if (err)
   1071                          extent_io_tree_panic(tree, err);
   1072
   1073                  prealloc = NULL;
   (...)
   1089                  goto search_again;

   This splits the state record into two, one for range [0, 128KiB) and
   another for the range [128KiB, 1MiB). Both already have the EXTENT_DIRTY
   bit set. Then we jump to the 'search_again' label, where we unlock the
   the spinlock protecting the extent io tree before jumping to the
   'again' label to perform the next iteration;

4) In the meanwhile, delalloc is flushed, the ordered extent for the range
   [512KiB, 768KiB) is created and when it completes, at
   btrfs_finish_ordered_io(), it calls btrfs_inode_safe_disk_i_size_write()
   with a value of 0 for its 'new_size' argument;

5) Before the deduplication task currently at __set_extent_bit() moves to
   the next iteration, the task finishing the ordered extent calls
   find_first_extent_bit() through btrfs_inode_safe_disk_i_size_write()
   and gets 'start' set to 0 and 'end' set to 128KiB - because at this
   moment the io tree has two extent state records, one representing the
   range [0, 128KiB) and another representing the range [128KiB, 1MiB),
   both with EXTENT_DIRTY set. Then we set 'isize' to:

   isize = min(isize, end + 1)
         = min(1MiB, 128KiB - 1 + 1)
         = 128KiB

   Then we set the inode's disk_i_size to 128KiB (isize).

   After a clean unmount of the filesystem and mounting it again, we have
   the file with a size of 128KiB, and effectively lost all the data it
   had before in the range from 128KiB to 1MiB.

This change fixes that issue too, as we never end up splitting extent
state records when they already have all the bits we want set.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Thu, 13 Feb 2020 15:47:30 +0000 (10:47 -0500)]

btrfs: handle logged extent failure properly

If we're allocating a logged extent we attempt to insert an extent
record for the file extent directly.  We increase
space_info->bytes_reserved, because the extent entry addition will call
btrfs_update_block_group(), which will convert the ->bytes_reserved to
->bytes_used.  However if we fail at any point while inserting the
extent entry we will bail and leave space on ->bytes_reserved, which
will trigger a WARN_ON() on umount.  Fix this by pinning the space if we
fail to insert, which is what happens in every other failure case that
involves adding the extent entry.

CC: [email protected] # 5.4+
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Qu Wenruo [Wed, 12 Feb 2020 07:43:31 +0000 (15:43 +0800)]

btrfs: relocation: Remove is_cowonly_root()

This function is only used in read_fs_root(), which is just a wrapper of
btrfs_get_fs_root().

For all the mentioned essential roots except log root tree,
btrfs_get_fs_root() has its own quick path to grab them from fs_info
directly, thus no need for key.offset modification.

For subvolume trees, btrfs_get_fs_root() with key.offset == -1 is
completely fine.

For log trees and log root tree, it's impossible to hit them, as for
relocation all backrefs are fetched from commit root, which never
records log tree blocks.

Log tree blocks either get freed in regular transaction commit, or
replayed at mount time. At runtime we should never hit an backref for
log tree in extent tree.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Mon, 20 Jan 2020 14:09:18 +0000 (16:09 +0200)]

btrfs: switch to per-transaction pinned extents

This commit flips the switch to start tracking/processing pinned extents
on a per-transaction basis. It mostly replaces all references from
btrfs_fs_info::(pinned_extents|freed_extents[]) to
btrfs_transaction::pinned_extents.

Two notable modifications that warrant explicit mention are changing
clean_pinned_extents to get a reference to the previously running
transaction. The other one is removal of call to
btrfs_destroy_pinned_extent since transactions are going to be cleaned
in btrfs_cleanup_one_transaction.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Mon, 20 Jan 2020 14:09:17 +0000 (16:09 +0200)]

btrfs: Factor out pinned extent clean up in btrfs_delete_unused_bgs

Next patch is going to refactor how pinned extents are tracked which
will necessitate changing this code. To ease that work and contain the
changes factor the code now in preparation, this will also help review.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Mon, 20 Jan 2020 14:09:16 +0000 (16:09 +0200)]

btrfs: Mark pinned log extents as excluded

In preparation to making pinned extents per-transaction ensure that log
such extents are always excluded from caching. To achieve this in
addition to marking them via btrfs_pin_extent_for_log_replay they also
need to be marked with btrfs_add_excluded_extent to prevent log tree
extent buffer being loaded by the free space caching thread. That's
required since log tree blocks are not recorded in the extent tree, hence
they always look free.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Mon, 20 Jan 2020 14:09:15 +0000 (16:09 +0200)]

btrfs: Pass transaction handle to write_pinned_extent_entries

Preparation for refactoring pinned extents tracking.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Mon, 20 Jan 2020 14:09:14 +0000 (16:09 +0200)]

btrfs: Make pin_down_extent take transaction handle

All callers have a reference to a transaction handle so pass it to
pin_down_extent. This is the final step before switching pinned extent
tracking to a per-transaction basis.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Mon, 20 Jan 2020 14:09:13 +0000 (16:09 +0200)]

btrfs: Make btrfs_pin_extent_for_log_replay take transaction handle

Preparation for refactoring pinned extents tracking.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Mon, 20 Jan 2020 14:09:12 +0000 (16:09 +0200)]

btrfs: Make btrfs_pin_reserved_extent take transaction handle

btrfs_pin_reserved_extent is now only called with a valid transaction so
exploit the fact to take a transaction. This is preparation for tracking
pinned extents on a per-transaction basis.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Mon, 20 Jan 2020 14:09:11 +0000 (16:09 +0200)]

btrfs: Call btrfs_pin_reserved_extent only during active transaction

Calling btrfs_pin_reserved_extent makes sense only with a valid
transaction since pinned extents are processed from transaction commit
in btrfs_finish_extent_commit. In case of error it's sufficient to
adjust the reserved counter to account for log tree extents allocated in
the last transaction.

This commit moves btrfs_pin_reserved_extent to be called only with valid
transaction handle and otherwise uses the newly introduced
unaccount_log_buffer to adjust "reserved". If this is not done if a
failure occurs before transaction is committed WARN_ON are going to be
triggered on unmount. This was especially pronounced with generic/475
test.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Mon, 20 Jan 2020 14:09:10 +0000 (16:09 +0200)]

btrfs: Introduce unaccount_log_buffer

This function correctly adjusts the reserved bytes occupied by a log
tree extent buffer. It will be used instead of calling
btrfs_pin_reserved_extent.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Mon, 20 Jan 2020 14:09:09 +0000 (16:09 +0200)]

btrfs: Make btrfs_pin_extent take trans handle

Preparation for switching pinned extent tracking to a per-transaction
basis.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Nikolay Borisov [Mon, 20 Jan 2020 14:09:08 +0000 (16:09 +0200)]

btrfs: Perform pinned cleanup directly in btrfs_destroy_delayed_refs

Having btrfs_destroy_delayed_refs call btrfs_pin_extent is problematic
for making pinned extents tracking per-transaction since
btrfs_trans_handle cannot be passed to btrfs_pin_extent in this context.
Additionally delayed refs heads pinned in btrfs_destroy_delayed_refs
are going to be handled very closely, in btrfs_destroy_pinned_extent.

To enable btrfs_pin_extent to take btrfs_trans_handle simply open code
it in btrfs_destroy_delayed_refs and call btrfs_error_unpin_extent_range
on the range. This enables us to do less work in
btrfs_destroy_pinned_extent and leaves btrfs_pin_extent being called in
contexts which have a valid btrfs_trans_handle.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Anand Jain [Thu, 13 Feb 2020 08:40:53 +0000 (16:40 +0800)]

btrfs: sysfs, unify handler name of devinfo/missing

The devinfo attribute handlers were added in 668e48af7a94 ("btrfs:
sysfs, add devid/dev_state kobject and device attributes") and the name
should contain _devinfo_, there's one that does not conform, so unify it
with the rest.

Signed-off-by: Anand Jain <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Anand Jain [Wed, 12 Feb 2020 09:28:13 +0000 (17:28 +0800)]

btrfs: sysfs, rename device_link add/remove functions

Since commit 668e48af7a94 ("btrfs: sysfs, add devid/dev_state kobject and
device attributes"), the functions btrfs_sysfs_add_device_link() and
btrfs_sysfs_rm_device_link() do more than just adding and removing the
device link as its name indicated. Rename them to be more specific
that's about the directory with the attirbutes

Signed-off-by: Anand Jain <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Anand Jain [Wed, 12 Feb 2020 09:28:12 +0000 (17:28 +0800)]

btrfs: sysfs, use btrfs_sysfs_remove_fsid to celanup errors in add_fsid

We have one simple function btrfs_sysfs_remove_fsid() to undo
btrfs_sysfs_add_fsid(), which also does proper checks before releasing
objects.

One difference, if btrfs_sysfs_remove_fsid is used that now we also call
kobject_del() which was missing before. This was tested (with kobject
debug turned on) and no change in behaviour was found.

This is a cleanup patch.

Signed-off-by: Anand Jain <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 5 Feb 2020 18:09:42 +0000 (19:09 +0100)]

btrfs: sink argument tree to __do_readpage

The tree pointer can be safely read from the inode, use it and drop the
redundant argument.

Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 5 Feb 2020 18:09:40 +0000 (19:09 +0100)]

btrfs: sink arugment tree to contiguous_readpages

The tree pointer can be safely read from the inode, use it and drop the
redundant argument.

Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 5 Feb 2020 18:09:37 +0000 (19:09 +0100)]

btrfs: sink argument tree to __extent_read_full_page

The tree pointer can be safely read from the inode, use it and drop the
redundant argument.

Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 5 Feb 2020 18:09:35 +0000 (19:09 +0100)]

btrfs: sink argument tree to extent_read_full_page

The tree pointer can be safely read from the page's inode, use it and
drop the redundant argument.

Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 5 Feb 2020 18:09:33 +0000 (19:09 +0100)]

btrfs: drop argument tree from btrfs_lock_and_flush_ordered_range

The tree pointer can be safely read from the inode so we can drop the
redundant argument from btrfs_lock_and_flush_ordered_range.

Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 5 Feb 2020 18:09:30 +0000 (19:09 +0100)]

btrfs: add assertions for tree == inode->io_tree to extent IO helpers

Add assertions to all helpers that get tree as argument and verify that
it's the same that can be obtained from the inode or from its pages. In
followup patches the redundant arguments and assertions will be removed
one by one.

Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 5 Feb 2020 18:09:28 +0000 (19:09 +0100)]

btrfs: drop argument tree from submit_extent_page

Now that we're sure the tree from argument is same as the one we can get
from the page's inode io_tree, drop the redundant argument.

Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 5 Feb 2020 18:09:26 +0000 (19:09 +0100)]

btrfs: remove extent_page_data::tree

All functions that set up extent_page_data::tree set it to the inode
io_tree. That's passed down the callstack that accesses either the same
inode or its pages. In the end submit_extent_page can pull the tree out
of the page and we don't have to store it in the structure.

Reviewed-by: Anand Jain <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 5 Feb 2020 16:34:34 +0000 (17:34 +0100)]

btrfs: add wrapper for transaction abort predicate

The status of aborted transaction can change between calls and it needs
to be accessed by READ_ONCE. Add a helper that also wraps the unlikely
hint.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 5 Feb 2020 16:26:51 +0000 (17:26 +0100)]

btrfs: move root node locking helpers to locking.c

The helpers are related to locking so move them there, update comments.

Reviewed-by: Anand Jain <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:33:01 +0000 (09:33 -0500)]

btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root

We are now using these for all roots, rename them to btrfs_put_root()
and btrfs_grab_root();

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:33:00 +0000 (09:33 -0500)]

btrfs: add a leak check for roots

Now that we're going to start relying on getting ref counting right for
roots, add a list to track allocated roots and print out any roots that
aren't freed up at free_fs_info time.

Hide this behind CONFIG_BTRFS_DEBUG because this will just be used for
developers to verify they aren't breaking things.

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:59 +0000 (09:32 -0500)]

btrfs: make the init of static elements in fs_info separate

In adding things like eb leak checking and root leak checking there were
a lot of weird corner cases that come from the fact that

  1) We do not init the fs_info until we get to open_ctree time in the
     normal case and

  2) The test infrastructure half-init's the fs_info for things that it
     needs.

This makes it really annoying to make changes because you have to add
init in two different places, have special cases for testing fs_info's
that may not have certain things initialized, and cases for fs_info's
that didn't make it to open_ctree and thus are not fully set up.

Fix this by extracting out the non-allocating init of the fs info into
it's own public function and use that to make sure we're all getting
consistent views of an allocated fs_info.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:58 +0000 (09:32 -0500)]

btrfs: move fs_info init work into it's own helper function

open_ctree mixes initialization of fs stuff and fs_info stuff, which
makes it confusing when doing things like adding the root leak
detection. Make a separate function that inits all the static
structures inside of the fs_info needed for the fs to operate, and then
call that before we start setting up the fs_info to be mounted.

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:57 +0000 (09:32 -0500)]

btrfs: free more things in btrfs_free_fs_info

Things like the percpu_counters, the mapping_tree, and the csum hash can
all be freed at btrfs_free_fs_info time, since the helpers all check if
the structure has been initialized already. This significantly cleans
up the error cases in open_ctree.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:56 +0000 (09:32 -0500)]

btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root

Now that all callers of btrfs_get_fs_root are subsequently calling
btrfs_grab_fs_root and handling dropping the ref when they are done
appropriately, go ahead and push btrfs_grab_fs_root up into
btrfs_get_fs_root.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:55 +0000 (09:32 -0500)]

btrfs: use btrfs_put_fs_root to free roots always

If we are going to track leaked roots we need to free them all the same
way, so don't kfree() roots directly, use btrfs_put_fs_root.

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:54 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in open_ctree

We lookup the fs_root and put it in our fs_info directly, we should hold
a ref on this root for the lifetime of the fs_info.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:53 +0000 (09:32 -0500)]

btrfs: export and rename free_fs_info

We're going to start freeing roots and doing other complicated things in
free_fs_info, so we need to move it to disk-io.c and export it in order
to use things lik btrfs_put_fs_root().

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:52 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in btrfs_check_uuid_tree_entry

We lookup the uuid of arbitrary subvolumes, hold a ref on the root while
we're doing this.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:51 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in btrfs_recover_log_trees

We replay the log into arbitrary fs roots, hold a ref on the root while
we're doing this.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:50 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in create_pending_snapshot

We create the snapshot and then use it for a bunch of things, we need to
hold a ref on it while we're messing with it.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Thu, 6 Feb 2020 15:24:26 +0000 (10:24 -0500)]

btrfs: hold a ref on the root in get_subvol_name_from_objectid

We lookup the name of a subvol which means we'll cross into different
roots. Hold a ref while we're doing the look ups in the fs_root we're
searching.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:48 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in btrfs_ioctl_send

We lookup all the clone roots and the parent root for send, so we need
to hold refs on all of these roots while we're processing them.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:47 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in scrub_print_warning_inode

We look up the root for the bytenr that is failing, so we need to hold a
ref on the root for that operation.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:46 +0000 (09:32 -0500)]

btrfs: hold a ref for the root in btrfs_find_orphan_roots

We lookup roots for every orphan item we have, we need to hold a ref on
the root while we're doing this work.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:45 +0000 (09:32 -0500)]

btrfs: push grab_fs_root into read_fs_root

All of relocation uses read_fs_root to lookup fs roots, so push the
btrfs_grab_fs_root() up into that helper and remove the individual
calls.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:44 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in btrfs_recover_relocation

We look up the fs root in various places in here when recovering from a
crashed relcoation. Make sure we hold a ref on the root whenever we
look them up.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:43 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in create_reloc_inode

We're creating a reloc inode in the data reloc tree, we need to hold a
ref on the root while we're doing that.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:42 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in find_data_references

We're looking up the data references for the bytenr in a root, we need
to hold a ref on that root while we're doing that.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:41 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in record_reloc_root_in_trans

We are recording this root in the transaction, so we need to hold a ref
on it until we do that.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:40 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in merge_reloc_roots

We look up the corresponding root for the reloc root, we need to hold a
ref while we're messing with it.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:39 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in prepare_to_merge

We look up the reloc roots corresponding root, we need to hold a ref on
that root.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:38 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in build_backref_tree

This is trickier than the previous conversions.  We have backref_node's
that need to hold onto their root for their lifetime.  Do the read of
the root and grab the ref.  If at any point we don't use the root we
discard it, however if we use it in our backref node we don't free it
until we free the backref node.  Any time we switch the root's for the
backref node we need to drop our ref on the old root and grab the ref on
the new root, and if we dupe a node we need to get a ref on the root
there as well.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:37 +0000 (09:32 -0500)]

btrfs: hold ref on root in btrfs_ioctl_default_subvol

We look up an arbitrary fs root here, we need to hold a ref on the root
for the duration.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:36 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in btrfs_ioctl_get_subvol_info

We look up whatever root userspace has given us, we need to hold a ref
throughout this operation. Use 'root' only for the on fs root and not as
a temporary variable elsewhere.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:35 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in btrfs_search_path_in_tree_user

We can wander into a different root, so grab a ref on the root we look
up. Later on we make root = fs_info->tree_root so we need this separate
out label to make sure we do the right cleanup only in the case we're
looking up a different root.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:34 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in btrfs_search_path_in_tree

We look up an arbitrary fs root, we need to hold a ref on it while we're
doing our search.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:33 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in search_ioctl

We lookup a arbitrary fs root, we need to hold a ref on that root. If
we're using our own inodes root then grab a ref on that as well to make
the cleanup easier.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:32 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in create_subvol

We're creating the new root here, but we should hold the ref until after
we've initialized the inode for it.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:31 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in fixup_tree_root_location

Looking up the inode from an arbitrary tree means we need to hold a ref
on that root.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:30 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in __btrfs_run_defrag_inode

We are looking up an arbitrary inode, we need to hold a ref on the root
while we're doing this.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:29 +0000 (09:32 -0500)]

btrfs: hold a root ref in btrfs_get_dentry

Looking up the inode we need to search the root, make sure we hold a
reference on that root while we're doing the lookup.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:28 +0000 (09:32 -0500)]

btrfs: hold a ref on the root in resolve_indirect_ref

We're looking up a random root, we need to hold a ref on it while we're
using it.

Reviewed-by: David Sterba <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:27 +0000 (09:32 -0500)]

btrfs: hold a ref on fs roots while they're in the radix tree

If the root is sitting in the radix tree, we should probably have a ref
for the radix tree. Grab a ref on the root when we insert it, and drop
it when it gets deleted.

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Tue, 4 Feb 2020 18:18:56 +0000 (13:18 -0500)]

btrfs: describe the space reservation system in general

Add another comment to cover how the space reservation system works
generally. This covers the actual reservation flow, as well as how
flushing is handled.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Tue, 4 Feb 2020 18:18:55 +0000 (13:18 -0500)]

btrfs: add a comment describing delalloc space reservation

delalloc space reservation is tricky because it encompasses both data
and metadata. Make it clear what each side does, the general flow of
how space is moved throughout the lifetime of a write, and what goes
into the calculations.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Tue, 4 Feb 2020 18:18:54 +0000 (13:18 -0500)]

btrfs: add a comment describing block reserves

This is a giant comment at the top of block-rsv.c describing generally
how block reserves work. It is purely about the block reserves
themselves, and nothing to do with how the actual reservation system
works.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:26 +0000 (09:32 -0500)]

btrfs: handle NULL roots in btrfs_put/btrfs_grab_fs_root

We want to use this for dropping all roots, and in some error cases we
may not have a root, so handle this to make the cleanup code easier.
Make btrfs_grab_fs_root the same so we can use it in cases where the
root may not exist (like the quota root).

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:25 +0000 (09:32 -0500)]

btrfs: make the fs root init functions static

Now that the orphan cleanup stuff doesn't use this directly we can just
make them static.

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:24 +0000 (09:32 -0500)]

btrfs: open code btrfs_read_fs_root_no_name

All this does is call btrfs_get_fs_root() with check_ref == true. Just
use btrfs_get_fs_root() so we don't have a bunch of different helpers
that do the same thing.

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:23 +0000 (09:32 -0500)]

btrfs: remove btrfs_read_fs_root, not used anymore

All helpers should either be using btrfs_get_fs_root() or
btrfs_read_tree_root().

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:22 +0000 (09:32 -0500)]

btrfs: make relocation use btrfs_read_tree_root()

Relocation has it's special roots, we don't want to save these in the
root cache either, so swap it to use btrfs_read_tree_root(). However
the reloc root does need REF_COWS set, so make sure we set it everywhere
we use this helper, as it no longer does the REF_COWS setting.

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:21 +0000 (09:32 -0500)]

btrfs: export and use btrfs_read_tree_root for tree-log

Tree-log uses btrfs_read_fs_root to load its log, but this just calls
btrfs_read_tree_root. We don't save the log roots in our root cache, so
just export this helper and use it in the logging code.

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Josef Bacik [Fri, 24 Jan 2020 14:32:20 +0000 (09:32 -0500)]

btrfs: make btrfs_find_orphan_roots use btrfs_get_fs_root

btrfs_find_orphan_roots has this weird thing where it looks up the root
in cache to see if it is there before just reading the root.  But the
read it uses just reads the root, it doesn't do any of the init work, we
do that by hand here.  But this is unnecessary, all we really want is to
see if the root still exists and add it to the dead roots list to be
cleaned up, otherwise we delete the orphan item.

Fix this by just using btrfs_get_fs_root directly with check_ref set to
false so we get the orphan root items.  Then we just handle in cache and
out of cache roots the same, add them to the dead roots list and carry
on.

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Empty description

RSS Atom

This page took 0.172873 seconds and 4 git commands to generate.