Git Repo - linux.git/log

NFSD: Use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Guobin Huang <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

sunrpc: Remove unused function ip_map_lookup

Fix the following clang warnings:

net/sunrpc/svcauth_unix.c:306:30: warning: unused function
'ip_map_lookup' [-Wunused-function].

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSv4.2: fix copy stateid copying for the async copy

This patch fixes Dan Carpenter's report that the static checker
found a problem where memcpy() was copying into too small of a buffer.

Reported-by: Dan Carpenter <[email protected]>
Fixes: e0639dc5805a ("NFSD introduce async copy feature")
Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Dai Ngo <[email protected]>

UAPI: nfsfh.h: Replace one-element array with flexible-array member

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

Use an anonymous union with a couple of anonymous structs in order to
keep userspace unchanged:

$ pahole -C nfs_fhbase_new fs/nfsd/nfsfh.o
struct nfs_fhbase_new {
        union {
                struct {
                        __u8       fb_version_aux;       /*     0     1 */
                        __u8       fb_auth_type_aux;     /*     1     1 */
                        __u8       fb_fsid_type_aux;     /*     2     1 */
                        __u8       fb_fileid_type_aux;   /*     3     1 */
                        __u32      fb_auth[1];           /*     4     4 */
                };                                       /*     0     8 */
                struct {
                        __u8       fb_version;           /*     0     1 */
                        __u8       fb_auth_type;         /*     1     1 */
                        __u8       fb_fsid_type;         /*     2     1 */
                        __u8       fb_fileid_type;       /*     3     1 */
                        __u32      fb_auth_flex[0];      /*     4     0 */
                };                                       /*     0     4 */
        };                                               /*     0     8 */

        /* size: 8, cachelines: 1, members: 1 */
        /* last cacheline: 8 bytes */
};

Also, this helps with the ongoing efforts to enable -Warray-bounds by
fixing the following warnings:

fs/nfsd/nfsfh.c: In function ‘nfsd_set_fh_dentry’:
fs/nfsd/nfsfh.c:191:41: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  191 |        ntohl((__force __be32)fh->fh_fsid[1])));
      |                              ~~~~~~~~~~~^~~
./include/linux/kdev_t.h:12:46: note: in definition of macro ‘MKDEV’
   12 | #define MKDEV(ma,mi) (((ma) << MINORBITS) | (mi))
      |                                              ^~
./include/uapi/linux/byteorder/little_endian.h:40:26: note: in expansion of macro ‘__swab32’
   40 | #define __be32_to_cpu(x) __swab32((__force __u32)(__be32)(x))
      |                          ^~~~~~~~
./include/linux/byteorder/generic.h:136:21: note: in expansion of macro ‘__be32_to_cpu’
  136 | #define ___ntohl(x) __be32_to_cpu(x)
      |                     ^~~~~~~~~~~~~
./include/linux/byteorder/generic.h:140:18: note: in expansion of macro ‘___ntohl’
  140 | #define ntohl(x) ___ntohl(x)
      |                  ^~~~~~~~
fs/nfsd/nfsfh.c:191:8: note: in expansion of macro ‘ntohl’
  191 |        ntohl((__force __be32)fh->fh_fsid[1])));
      |        ^~~~~
fs/nfsd/nfsfh.c:192:32: warning: array subscript 2 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  192 |    fh->fh_fsid[1] = fh->fh_fsid[2];
      |                     ~~~~~~~~~~~^~~
fs/nfsd/nfsfh.c:192:15: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  192 |    fh->fh_fsid[1] = fh->fh_fsid[2];
      |    ~~~~~~~~~~~^~~

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Clean up dto_q critical section in svc_rdma_recvfrom()

This, to me, seems less cluttered and less redundant. I was hoping
it could help reduce lock contention on the dto_q lock by reducing
the size of the critical section, but alas, the only improvement is
readability.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_arg

These fields are no longer used.

The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on
x86_64, down from 2440 bytes.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove sc_read_complete_q

Now that svc_rdma_recvfrom() waits for Read completion,
sc_read_complete_q is no longer used.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Single-stage RDMA Read

Currently the generic RPC server layer calls svc_rdma_recvfrom()
twice to retrieve an RPC message that uses Read chunks. I'm not
exactly sure why this design was chosen originally.

Instead, let's wait for the Read chunk completion inline in the
first call to svc_rdma_recvfrom().

The goal is to eliminate some page allocator churn.
rdma_read_complete() replaces pages in the second svc_rqst by
calling put_page() repeatedly while the upper layer waits for the
request to be constructed, which adds unnecessary NFS WRITE round-
trip latency.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Tom Talpey <[email protected]>

SUNRPC: Move svc_xprt_received() call sites

Currently, XPT_BUSY is not cleared until xpo_recvfrom returns.
That effectively blocks the receipt and handling of the next RPC
message until the current one has been taken off the transport.
This strict ordering is a requirement for socket transports.

For our kernel RPC/RDMA transport implementation, however, dequeuing
an ingress message is nothing more than a list_del(). The transport
can safely be marked un-busy as soon as that is done.

To keep the changes simpler, this patch just moves the
svc_xprt_received() call site from svc_handle_xprt() into the
transports, so that the actual optimization can be done in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Export svc_xprt_received()

Prepare svc_xprt_received() to be called from transport code instead
of from generic RPC server code.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Retain the page backing rq_res.head[0].iov_base

svc_rdma_sendto() now waits for the NIC hardware to finish with
the pages backing rq_res. We still have to release the page array
in some cases, but now it's always safe to immediately re-use the
page backing rq_res's head buffer.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove unused sc_pages field

Clean up. This significantly reduces the size of struct
svc_rdma_send_ctxt.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Normalize Send page handling

Currently svc_rdma_sendto() migrates xdr_buf pages into a separate
page list and NULLs out a bunch of entries in rq_pages while the
pages are under I/O. The Send completion handler then frees those
pages later.

Instead, let's wait for the Send completion, then handle page
releasing in the nfsd thread. I'd like to avoid the cost of 250+
put_page() calls in the Send completion handler, which is single-
threaded.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Add a "deferred close" helper

Refactor a bit of commonly used logic so that every site that wants
a close deferred to an nfsd thread does all the right things
(set_bit(XPT_CLOSE) then enqueue).

Also, once XPT_CLOSE is set on a transport, it is never cleared. If
XPT_CLOSE is already set, then the close is already being handled
and the enqueue can be skipped.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Maintain a Receive water mark

Post more Receives when the number of pending Receives drops below
a water mark. The batch mechanism is disabled if the underlying
device cannot support a reasonably-sized Receive Queue.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Use svc_rdma_refresh_recvs() in wc_receive

Replace svc_rdma_post_recv() with the new batch receive mechanism.
For the moment it is posting just a single Receive WR at a time,
so no change in behavior is expected.

Since svc_rdma_wc_receive() was the last call site for
svc_rdma_post_recv(), it is removed.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Add a batch Receive posting mechanism

Introduce a server-side mechanism similar to commit e340c2d6ef2a
("xprtrdma: Reduce the doorbell rate (Receive)") to post Receive
WRs in batch. Its first consumer is svc_rdma_post_recvs(), which
posts the initial set of Receive WRs.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove stale comment for svc_rdma_wc_receive()

xprt pinning was removed in commit 365e9992b90f ("svcrdma: Remove
transport reference counting"), but this comment was not updated
to reflect that change.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Provide an explanatory comment in CMA event handler

Clean up: explain why svc_xprt_enqueue() is invoked in the event
handler even though no xpt_flags bits are toggled here.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: RPCDBG_FACILITY is no longer used

Signed-off-by: Chuck Lever <[email protected]>

nfsd: report client confirmation status in "info" file

mountd can now monitor clients appearing and disappearing in
/proc/fs/nfsd/clients, and will log these events, in liu of the logging
of mount/unmount events for NFSv3.

Currently it cannot distinguish between unconfirmed clients (which might
be transient and totally uninteresting) and confirmed clients.

So add a "status: " line which reports either "confirmed" or
"unconfirmed", and use fsnotify to report that the info file
has been modified.

This requires a bit of infrastructure to keep the dentry for the "info"
file. There is no need to take a counted reference as the dentry must
remain around until the client is removed.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't ignore high bits of copy count

Note size_t is 32-bit on a 32-bit architecture, but cp_count is defined
by the protocol to be 64 bit, so we could be turning a large copy into a
0-length copy here.

Reported-by: <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: COPY with length 0 should copy to end of file

>From https://tools.ietf.org/html/rfc7862#page-65

A count of 0 (zero) requests that all bytes from ca_src_offset
through EOF be copied to the destination.

Reported-by: <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: Fix typo "accesible"

Trivial fix.

Cc: [email protected]
Signed-off-by: Ricardo Ribalda <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmounted

In order to ensure that knfsd threads don't linger once the nfsd
pseudofs is unmounted (e.g. when the container is killed) we let
nfsd_umount() shut down those threads and wait for them to exit.

This also should ensure that we don't need to do a kernel mount of
the pseudofs, since the thread lifetime is now limited by the
lifetime of the filesystem.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: Log client tracking type log message as info instead of warning

`printk()`, by default, uses the log level warning, which leaves the
user reading

NFSD: Using UMH upcall client tracking operations.

wondering what to do about it (`dmesg --level=warn`).

Several client tracking methods are tried, and expected to fail. That’s
why a message is printed only on success. It might be interesting for
users to know the chosen method, so use info-level instead of
debug-level.

Cc: [email protected]
Signed-off-by: Paul Menzel <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: helper for laundromat expiry calculations

We do this same logic repeatedly, and it's easy to get the sense of the
comparison wrong.

Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up NFSDDBG_FACILITY macro

These are no longer needed because there are no dprintk() call sites
in these files.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add a tracepoint to record directory entry encoding

Enable watching the progress of directory encoding to capture the
timing of any issues with reading or encoding a directory. The
new tracepoint captures dirent encoding for all NFS versions.

For example, here's what a few NFSv4 directory entries might look
like:

nfsd-989   [002]   468.596265: nfsd_dirent:          fh_hash=0x5d162594 ino=2 name=.
nfsd-989   [002]   468.596267: nfsd_dirent:          fh_hash=0x5d162594 ino=1 name=..
nfsd-989   [002]   468.596299: nfsd_dirent:          fh_hash=0x5d162594 ino=3827 name=zlib.c
nfsd-989   [002]   468.596325: nfsd_dirent:          fh_hash=0x5d162594 ino=3811 name=xdiff
nfsd-989   [002]   468.596351: nfsd_dirent:          fh_hash=0x5d162594 ino=3810 name=xdiff-interface.h
nfsd-989   [002]   468.596377: nfsd_dirent:          fh_hash=0x5d162594 ino=3809 name=xdiff-interface.c

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up after updating NFSv3 ACL encoders

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 SETACL result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up after updating NFSv2 ACL encoders

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 ACL ACCESS result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 ACL GETATTR result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 SETACL result encoder to use struct xdr_stream

The SETACL result encoder is exactly the same as the NFSv2
attrstatres decoder.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add an xdr_stream-based encoder for NFSv2/3 ACLs

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove unused NFSv2 directory entry encoders

Clean up.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 READDIR result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Count bytes instead of pages in the NFSv2 READDIR encoder

Clean up: Counting the bytes used by each returned directory entry
seems less brittle to me than trying to measure consumed pages after
the fact.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add a helper that encodes NFSv3 directory offset cookies

Refactor: Add helper function similar to nfs3svc_encode_cookie3().

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 STATFS result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 READ result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 READLINK result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 diropres encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 attrstat encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv2 stat encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Reduce svc_rqst::rq_pages churn during READDIR operations

During NFSv2 and NFSv3 READDIR/PLUS operations, NFSD advances
rq_next_page to the full size of the client-requested buffer, then
releases all those pages at the end of the request. The next request
to use that nfsd thread has to refill the pages.

NFSD does this even when the dirlist in the reply is small. With
NFSv3 clients that send READDIR operations with large buffer sizes,
that can be 256 put_page/alloc_page pairs per READDIR request, even
though those pages often remain unused.

We can save some work by not releasing dirlist buffer pages that
were not used to form the READDIR Reply. I've left the NFSv2 code
alone since there are never more than three pages involved in an
NFSv2 READDIR Reply.

Eventually we should nail down why these pages need to be released
at all in order to avoid allocating and releasing pages
unnecessarily.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove unused NFSv3 directory entry encoders

Clean up.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream

The benefit of the xdr_stream helpers is that they transparently
handle encoding an XDR data item that crosses page boundaries.
Most of the open-coded logic to do that here can be eliminated.

A sub-buffer and sub-stream are set up as a sink buffer for the
directory entry encoder. As an entry is encoded, it is added to
the end of the content in this buffer/stream. The total length of
the directory list is tracked in the buffer's @len field.

When it comes time to encode the Reply, the sub-buffer is merged
into rq_res's page array at the correct place using
xdr_write_pages().

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 READDIR3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Count bytes instead of pages in the NFSv3 READDIR encoder

Clean up: Counting the bytes used by each returned directory entry
seems less brittle to me than trying to measure consumed pages after
the fact.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add a helper that encodes NFSv3 directory offset cookies

Refactor: De-duplicate identical code that handles encoding of
directory offset cookies across page boundaries.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 COMMIT3res encoder to use struct xdr_stream

As an additional clean up, encode_wcc_data() is removed because it
is now no longer used.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 FSINFO3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 FSSTAT3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 LINK3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 RENAMEv3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 CREATE family of encoders to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 WRITE3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 READ3res encode to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 READLINK3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 wccstat result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 LOOKUP3res encoder to use struct xdr_stream

Also, clean up: Rename the encoder function to match the name of
the result structure in RFC 1813, consistent with other encoder
function names in nfs3xdr.c. "diropres" is an NFSv2 thingie.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the NFSv3 ACCESS3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update the GETATTR3res encoder to use struct xdr_stream

As an additional clean up, some renaming is done to more closely
reflect the data type and variable names used in the NFSv3 XDR
definition provided in RFC 1813. "attrstat" is an NFSv2 thingie.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Extract the svcxdr_init_encode() helper

NFSD initializes an encode xdr_stream only after the RPC layer has
already inserted the RPC Reply header. Thus it behaves differently
than xdr_init_encode does, which assumes the passed-in xdr_buf is
entirely devoid of content.

nfs4proc.c has this server-side stream initialization helper, but
it is visible only to the NFSv4 code. Move this helper to a place
that can be accessed by NFSv2 and NFSv3 server XDR functions.

Signed-off-by: Chuck Lever <[email protected]>

Linux 5.12-rc4

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes for v5.12"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: initialize ret to suppress smatch warning
  ext4: stop inode update before return
  ext4: fix rename whiteout with fast commit
  ext4: fix timer use-after-free on failed mount
  ext4: fix potential error in ext4_do_update_inode
  ext4: do not try to set xattr into ea_inode if value is empty
  ext4: do not iput inode under running transaction in ext4_rename()
  ext4: find old entry again if failed to rename whiteout
  ext4: fix error handling in ext4_end_enable_verity()
  ext4: fix bh ref count on error paths
  fs/ext4: fix integer overflow in s_log_groups_per_flex
  ext4: add reclaim checks to xattr code
  ext4: shrink race window in ext4_should_retry_alloc()

Merge tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block

Pull io_uring followup fixes from Jens Axboe:

- The SIGSTOP change from Eric, so we properly ignore that for
   PF_IO_WORKER threads.

- Disallow sending signals to PF_IO_WORKER threads in general, we're
   not interested in having them funnel back to the io_uring owning
   task.

- Stable fix from Stefan, ensuring we properly break links for short
   send/sendmsg recv/recvmsg if MSG_WAITALL is set.

- Catch and loop when needing to run task_work before a PF_IO_WORKER
   threads goes to sleep.

* tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block:
  io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
  io-wq: ensure task is running before processing task_work
  signal: don't allow STOP on PF_IO_WORKER threads
  signal: don't allow sending any signals to PF_IO_WORKER threads

Merge tag 'staging-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO driver fixes from Greg KH:
"Some small staging and IIO driver fixes:

   - MAINTAINERS changes for the move of the staging mailing list

   - comedi driver fixes to get request_irq() to work correctly

   - counter driver fixes for reported issues with iio devices

   - tiny iio driver fixes for reported issues.

  All of these have been in linux-next with no reported problems"

* tag 'staging-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: vt665x: fix alignment constraints
  staging: comedi: cb_pcidas64: fix request_irq() warn
  staging: comedi: cb_pcidas: fix request_irq() warn
  MAINTAINERS: move the staging subsystem to lists.linux.dev
  MAINTAINERS: move some real subsystems off of the staging mailing list
  iio: gyro: mpu3050: Fix error handling in mpu3050_trigger_handler
  iio: hid-sensor-temperature: Fix issues of timestamp channel
  iio: hid-sensor-humidity: Fix alignment issue of timestamp channel
  counter: stm32-timer-cnt: fix ceiling miss-alignment with reload register
  counter: stm32-timer-cnt: fix ceiling write max value
  counter: stm32-timer-cnt: Report count function when SLAVE_MODE_DISABLED
  iio: adc: ab8500-gpadc: Fix off by 10 to 3
  iio:adc:stm32-adc: Add HAS_IOMEM dependency
  iio: adis16400: Fix an error code in adis16400_initial_setup()
  iio: adc: adi-axi-adc: add proper Kconfig dependencies
  iio: adc: ad7949: fix wrong ADC result due to incorrect bit mask
  iio: hid-sensor-prox: Fix scale not correct issue
  iio:adc:qcom-spmi-vadc: add default scale to LR_MUX2_BAT_ID channel

Merge tag 'usb-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB and Thunderbolt driver fixes from Greg KH:
"Here are some small Thunderbolt and USB driver fixes for some reported
  issues:

   - thunderbolt fixes for minor problems

   - typec fixes for power issues

   - usb-storage quirk addition

   - usbip bugfix

   - dwc3 bugfix when stopping transfers

   - cdnsp bugfix for isoc transfers

   - gadget use-after-free fix

  All have been in linux-next this week with no reported issues"

* tag 'usb-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: tcpm: Skip sink_cap query only when VDM sm is busy
  usb: dwc3: gadget: Prevent EP queuing while stopping transfers
  usb: typec: tcpm: Invoke power_supply_changed for tcpm-source-psy-
  usb: typec: Remove vdo[3] part of tps6598x_rx_identity_reg struct
  usb-storage: Add quirk to defeat Kindle's automatic unload
  usb: gadget: configfs: Fix KASAN use-after-free
  usbip: Fix incorrect double assignment to udc->ud.tcp_rx
  usb: cdnsp: Fixes incorrect value in ISOC TRB
  thunderbolt: Increase runtime PM reference count on DP tunnel discovery
  thunderbolt: Initialize HopID IDAs in tb_switch_alloc()

Merge tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
"A change to robustify force-threaded IRQ handlers to always disable
  interrupts, plus a DocBook fix.

  The force-threaded IRQ handler change has been accelerated from the
  normal schedule of such a change to keep the bad pattern/workaround of
  spin_lock_irqsave() in handlers or IRQF_NOTHREAD as a kludge from
  spreading"

* tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Disable interrupts for force threaded handlers
  genirq/irq_sim: Fix typos in kernel doc (fnode -> fwnode)

Merge tag 'perf-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Boundary condition fixes for bugs unearthed by the perf fuzzer"

* tag 'perf-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix unchecked MSR access error caused by VLBR_EVENT
perf/x86/intel: Fix a crash caused by zero PEBS status

Merge tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:

- Get static calls & modules right. Hopefully.

- WW mutex fixes

* tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  static_call: Fix static_call_update() sanity check
  static_call: Align static_call_is_init() patching condition
  static_call: Fix static_call_set_init()
  locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini()
  locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling

Merge tag 'efi-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Ingo Molnar:

- another missing RT_PROP table related fix, to ensure that the
   efivarfs pseudo filesystem fails gracefully if variable services
   are unsupported

- use the correct alignment for literal EFI GUIDs

- fix a use after unmap issue in the memreserve code

* tag 'efi-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: use 32-bit alignment for efi_guid_t literals
  firmware/efi: Fix a use after bug in efi_mem_reserve_persistent
  efivars: respect EFI_UNSUPPORTED return from firmware

Merge tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
"The freshest pile of shiny x86 fixes for 5.12:

   - Add the arch-specific mapping between physical and logical CPUs to
     fix devicetree-node lookups

   - Restore the IRQ2 ignore logic

   - Fix get_nr_restart_syscall() to return the correct restart syscall
     number. Split in a 4-patches set to avoid kABI breakage when
     backporting to dead kernels"

* tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/of: Fix CPU devicetree-node lookups
  x86/ioapic: Ignore IRQ2 again
  x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART
  x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall()
  x86: Move TS_COMPAT back to asm/thread_info.h
  kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()

Merge tag 'powerpc-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

- Fix a possible stack corruption and subsequent DLPAR failure in the
   rpadlpar_io PCI hotplug driver

- Two build fixes for uncommon configurations

Thanks to Christophe Leroy and Tyrel Datwyler.

* tag 'powerpc-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  PCI: rpadlpar: Fix potential drc_name corruption in store functions
  powerpc: Force inlining of cpu_has_feature() to avoid build failure
  powerpc/vdso32: Add missing _restgpr_31_x to fix build failure

io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL

Without that it's not safe to use them in a linked combination with
others.

Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
should be possible.

We already handle short reads and writes for the following opcodes:

- IORING_OP_READV
- IORING_OP_READ_FIXED
- IORING_OP_READ
- IORING_OP_WRITEV
- IORING_OP_WRITE_FIXED
- IORING_OP_WRITE
- IORING_OP_SPLICE
- IORING_OP_TEE

Now we have it for these as well:

- IORING_OP_SENDMSG
- IORING_OP_SEND
- IORING_OP_RECVMSG
- IORING_OP_RECV

For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
flags in order to call req_set_fail_links().

There might be applications arround depending on the behavior
that even short send[msg]()/recv[msg]() retuns continue an
IOSQE_IO_LINK chain.

It's very unlikely that such applications pass in MSG_WAITALL,
which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.

It's expected that the low level sock_sendmsg() call just ignores
MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
SO_ZEROCOPY.

We also expect the caller to know about the implicit truncation to
MAX_RW_COUNT, which we don't detect.

cc: [email protected]
Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.org
Signed-off-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io-wq: ensure task is running before processing task_work

Mark the current task as running if we need to run task_work from the
io-wq threads as part of work handling. If that is the case, then return
as such so that the caller can appropriately loop back and reset if it
was part of a going-to-sleep flush.

Fixes: 3bfe6106693b ("io-wq: fork worker threads from original task")
Signed-off-by: Jens Axboe <[email protected]>

signal: don't allow STOP on PF_IO_WORKER threads

Just like we don't allow normal signals to IO threads, don't deliver a
STOP to a task that has PF_IO_WORKER set. The IO threads don't take
signals in general, and have no means of flushing out a stop either.

Longer term, we may want to look into allowing stop of these threads,
as it relates to eg process freezing. For now, this prevents a spin
issue if a SIGSTOP is delivered to the parent task.

Reported-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>

signal: don't allow sending any signals to PF_IO_WORKER threads

They don't take signals individually, and even if they share signals with
the parent task, don't allow them to be delivered through the worker
thread. Linux does allow this kind of behavior for regular threads, but
it's really a compatability thing that we need not care about for the IO
threads.

Reported-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

ext4: initialize ret to suppress smatch warning

Signed-off-by: Theodore Ts'o <[email protected]>

ext4: stop inode update before return

The inode update should be stopped before returing the error code.

Signed-off-by: Pan Bian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Cc: [email protected]
Reviewed-by: Harshad Shirwadkar <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fix rename whiteout with fast commit

This patch adds rename whiteout support in fast commits. Note that the
whiteout object that gets created is actually char device. Which
imples, the function ext4_inode_journal_mode(struct inode *inode)
would return "JOURNAL_DATA" for this inode. This has a consequence in
fast commit code that it will make creation of the whiteout object a
fast-commit ineligible behavior and thus will fall back to full
commits. With this patch, this can be observed by running fast commits
with rename whiteout and seeing the stats generated by ext4_fc_stats
tracepoint as follows:

ext4_fc_stats: dev 254:32 fc ineligible reasons:
XATTR:0, CROSS_RENAME:0, JOURNAL_FLAG_CHANGE:0, NO_MEM:0, SWAP_BOOT:0,
RESIZE:0, RENAME_DIR:0, FALLOC_RANGE:0, INODE_JOURNAL_DATA:16;
num_commits:6, ineligible: 6, numblks: 3

So in short, this patch guarantees that in case of rename whiteout, we
fall back to full commits.

Amir mentioned that instead of creating a new whiteout object for
every rename, we can create a static whiteout object with irrelevant
nlink. That will make fast commits to not fall back to full
commit. But until this happens, this patch will ensure correctness by
falling back to full commits.

Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Cc: [email protected]
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fix timer use-after-free on failed mount

When filesystem mount fails because of corrupted filesystem we first
cancel the s_err_report timer reminding fs errors every day and only
then we flush s_error_work. However s_error_work may report another fs
error and re-arm timer thus resulting in timer use-after-free. Fix the
problem by first flushing the work and only after that canceling the
s_err_report timer.

Reported-by: [email protected]
Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fix potential error in ext4_do_update_inode

If set_large_file = 1 and errors occur in ext4_handle_dirty_metadata(),
the error code will be overridden, go to out_brelse to avoid this
situation.

Signed-off-by: Shijie Luo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: do not try to set xattr into ea_inode if value is empty

Syzbot report a warning that ext4 may create an empty ea_inode if set
an empty extent attribute to a file on the file system which is no free
blocks left.

  WARNING: CPU: 6 PID: 10667 at fs/ext4/xattr.c:1640 ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640
  ...
  Call trace:
   ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640
   ext4_xattr_block_set+0x1d0/0x1b1c fs/ext4/xattr.c:1942
   ext4_xattr_set_handle+0x8a0/0xf1c fs/ext4/xattr.c:2390
   ext4_xattr_set+0x120/0x1f0 fs/ext4/xattr.c:2491
   ext4_xattr_trusted_set+0x48/0x5c fs/ext4/xattr_trusted.c:37
   __vfs_setxattr+0x208/0x23c fs/xattr.c:177
  ...

Now, ext4 try to store extent attribute into an external inode if
ext4_xattr_block_set() return -ENOSPC, but for the case of store an
empty extent attribute, store the extent entry into the extent
attribute block is enough. A simple reproduce below.

  fallocate test.img -l 1M
  mkfs.ext4 -F -b 2048 -O ea_inode test.img
  mount test.img /mnt
  dd if=/dev/zero of=/mnt/foo bs=2048 count=500
  setfattr -n "user.test" /mnt/foo

Reported-by: [email protected]
Fixes: 9c6e7853c531 ("ext4: reserve space for xattr entries/names")
Cc: [email protected]
Signed-off-by: zhangyi (F) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: do not iput inode under running transaction in ext4_rename()

In ext4_rename(), when RENAME_WHITEOUT failed to add new entry into
directory, it ends up dropping new created whiteout inode under the
running transaction. After commit <9b88f9fb0d2> ("ext4: Do not iput inode
under running transaction"), we follow the assumptions that evict() does
not get called from a transaction context but in ext4_rename() it breaks
this suggestion. Although it's not a real problem, better to obey it, so
this patch add inode to orphan list and stop transaction before final
iput().

Signed-off-by: zhangyi (F) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: find old entry again if failed to rename whiteout

If we failed to add new entry on rename whiteout, we cannot reset the
old->de entry directly, because the old->de could have moved from under
us during make indexed dir. So find the old entry again before reset is
needed, otherwise it may corrupt the filesystem as below.

  /dev/sda: Entry '00000001' in ??? (12) has deleted/unused inode 15. CLEARED.
  /dev/sda: Unattached inode 75
  /dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

Fixes: 6b4b8e6b4ad ("ext4: fix bug for rename with RENAME_WHITEOUT")
Cc: [email protected]
Signed-off-by: zhangyi (F) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

genirq: Disable interrupts for force threaded handlers

With interrupt force threading all device interrupt handlers are invoked
from kernel threads. Contrary to hard interrupt context the invocation only
disables bottom halfs, but not interrupts. This was an oversight back then
because any code like this will have an issue:

thread(irq_A)
  irq_handler(A)
    spin_lock(&foo->lock);

interrupt(irq_B)
  irq_handler(B)
    spin_lock(&foo->lock);

This has been triggered with networking (NAPI vs. hrtimers) and console
drivers where printk() happens from an interrupt which interrupted the
force threaded handler.

Now people noticed and started to change the spin_lock() in the handler to
spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the
interrupt request which in turn breaks RT.

Fix the root cause and not the symptom and disable interrupts before
invoking the force threaded handler which preserves the regular semantics
and the usefulness of the interrupt force threading as a general debugging
tool.

For not RT this is not changing much, except that during the execution of
the threaded handler interrupts are delayed until the handler
returns. Vs. scheduling and softirq processing there is no difference.

For RT kernels there is no issue.

Fixes: 8d32a307e4fa ("genirq: Provide forced interrupt threading")
Reported-by: Johan Hovold <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Johan Hovold <[email protected]>
Acked-by: Sebastian Andrzej Siewior <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'riscv-for-linus-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
"A handful of fixes for 5.12:

   - fix the SBI remote fence numbers for hypervisor fences, which had
     been transcribed in the wrong order in Linux. These fences are only
     used with the KVM patches applied.

   - fix a whole host of build warnings, these should have no functional
     change.

   - fix init_resources() to prevent an off-by-one error from causing an
     out-of-bounds array reference. This was manifesting during boot on
     vexriscv.

   - ensure the KASAN mappings are visible before proceeding to use
     them"

* tag 'riscv-for-linus-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Correct SPARSEMEM configuration
  RISC-V: kasan: Declare kasan_shallow_populate() static
  riscv: Ensure page table writes are flushed when initializing KASAN vmalloc
  RISC-V: Fix out-of-bounds accesses in init_resources()
  riscv: Fix compilation error with Canaan SoC
  ftrace: Fix spelling mistake "disabed" -> "disabled"
  riscv: fix bugon.cocci warnings
  riscv: process: Fix no prototype for arch_dup_task_struct
  riscv: ftrace: Use ftrace_get_regs helper
  riscv: process: Fix no prototype for show_regs
  riscv: syscall_table: Reduce W=1 compilation warnings noise
  riscv: time: Fix no prototype for time_init
  riscv: ptrace: Fix no prototype warnings
  riscv: sbi: Fix comment of __sbi_set_timer_v01
  riscv: irq: Fix no prototype warning
  riscv: traps: Fix no prototype warnings
  RISC-V: correct enum sbi_ext_rfence_fid

Merge tag '5.12-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"Five cifs/smb3 fixes - three for stable, including an important ACL
  fix and security signature fix"

* tag '5.12-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix allocation size on newly created files
  cifs: warn and fail if trying to use rootfs without the config option
  fs/cifs/: fix misspellings using codespell tool
  cifs: Fix preauth hash corruption
  cifs: update new ACE pointer after populate_new_aces.

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Eight fixes, all in drivers, all fairly minor either being fixes in
  error legs, memory leaks on teardown, context errors or semantic
  problems"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mpt3sas: Do not use GFP_KERNEL in atomic context
  scsi: ufs: ufs-mediatek: Correct operator & -> &&
  scsi: sd_zbc: Update write pointer offset cache
  scsi: lpfc: Fix some error codes in debugfs
  scsi: qla2xxx: Fix broken #endif placement
  scsi: st: Fix a use after free in st_open()
  scsi: myrs: Fix a double free in myrs_cleanup()
  scsi: ibmvfc: Free channel_setup_buf during device tear down

Merge tag 'zonefs-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:

- fix inode write open reference count (Chao)

- Fix wrong write offset for asynchronous O_APPEND writes (me)

- Prevent use of sequential zone file as swap files (me)

* tag 'zonefs-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: fix to update .i_wr_refcnt correctly in zonefs_open_zone()
  zonefs: Fix O_APPEND async write handling
  zonefs: prevent use of seq files as swap file

Merge tag 'block-5.12-2021-03-19' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Just an NVMe pull request this week:

   - fix tag allocation for keep alive

   - fix a unit mismatch for the Write Zeroes limits

   - various TCP transport fixes (Sagi Grimberg, Elad Grupi)

   - fix iosqes and iocqes validation for discovery controllers (Sagi Grimberg)"

* tag 'block-5.12-2021-03-19' of git://git.kernel.dk/linux-block:
  nvmet-tcp: fix kmap leak when data digest in use
  nvmet: don't check iosqes,iocqes for discovery controllers
  nvme-rdma: fix possible hang when failing to set io queues
  nvme-tcp: fix possible hang when failing to set io queues
  nvme-tcp: fix misuse of __smp_processor_id with preemption enabled
  nvme-tcp: fix a NULL deref when receiving a 0-length r2t PDU
  nvme: fix Write Zeroes limitations
  nvme: allocate the keep alive request using BLK_MQ_REQ_NOWAIT
  nvme: merge nvme_keep_alive into nvme_keep_alive_work
  nvme-fabrics: only reserve a single tag

Merge tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"Quieter week this time, which was both expected and desired. About
  half of the below is fixes for this release, the other half are just
  fixes in general. In detail:

   - Fix the freezing of IO threads, by making the freezer not send them
     fake signals. Make them freezable by default.

   - Like we did for personalities, move the buffer IDR to xarray. Kills
     some code and avoids a use-after-free on teardown.

   - SQPOLL cleanups and fixes (Pavel)

   - Fix linked timeout race (Pavel)

   - Fix potential completion post use-after-free (Pavel)

   - Cleanup and move internal structures outside of general kernel view
     (Stefan)

   - Use MSG_SIGNAL for send/recv from io_uring (Stefan)"

* tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block:
  io_uring: don't leak creds on SQO attach error
  io_uring: use typesafe pointers in io_uring_task
  io_uring: remove structures from include/linux/io_uring.h
  io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
  io_uring: fix sqpoll cancellation via task_work
  io_uring: add generic callback_head helpers
  io_uring: fix concurrent parking
  io_uring: halt SQO submission on ctx exit
  io_uring: replace sqd rw_semaphore with mutex
  io_uring: fix complete_post use ctx after free
  io_uring: fix ->flags races by linked timeouts
  io_uring: convert io_buffer_idr to XArray
  io_uring: allow IO worker threads to be frozen
  kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing