Git Repo - linux.git/log

sfc: fix build warnings on 32-bit

Truncation of DMA_BIT_MASK to 32-bit dma_addr_t is semantically safe,
but the compiler was warning because it was happening implicitly.
Insert explicit casts to suppress the warnings.

Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Edward Cree <[email protected]>
Acked-by: Randy Dunlap <[email protected]> # build-tested
Signed-off-by: David S. Miller <[email protected]>

net: phy: mscc: Fix a couple of spelling mistakes "spcified" -> "specified"

There are a couple of spelling mistakes in comment text. Fix these.

Signed-off-by: Kaige Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

riscv: Add SiFive drivers to rv32_defconfig

This adds SiFive drivers to rv32_defconfig, to keep in sync with the
64-bit config. This is useful when testing 32-bit kernel with QEMU
'sifive_u' 32-bit machine.

Signed-off-by: Bin Meng <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

dt-bindings: timer: Add CLINT bindings

We add DT bindings documentation for CLINT device.

Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Tested-by: Emil Renner Berhing <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Remove CLINT related code from timer and arch

Right now the RISC-V timer driver is convoluted to support:
1. Linux RISC-V S-mode (with MMU) where it will use TIME CSR for
   clocksource and SBI timer calls for clockevent device.
2. Linux RISC-V M-mode (without MMU) where it will use CLINT MMIO
   counter register for clocksource and CLINT MMIO compare register
   for clockevent device.

We now have a separate CLINT timer driver which also provide CLINT
based IPI operations so let's remove CLINT MMIO related code from
arch/riscv directory and RISC-V timer driver.

Signed-off-by: Anup Patel <[email protected]>
Tested-by: Emil Renner Berhing <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

clocksource/drivers: Add CLINT timer driver

We add a separate CLINT timer driver for Linux RISC-V M-mode (i.e.
RISC-V NoMMU kernel).

The CLINT MMIO device provides three things:
1. 64bit free running counter register
2. 64bit per-CPU time compare registers
3. 32bit per-CPU inter-processor interrupt registers

Unlike other timer devices, CLINT provides IPI registers along with
timer registers. To use CLINT IPI registers, the CLINT timer driver
provides IPI related callbacks to arch/riscv.

Signed-off-by: Anup Patel <[email protected]>
Tested-by: Emil Renner Berhing <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Add mechanism to provide custom IPI operations

We add mechanism to set custom IPI operations so that CLINT driver
from drivers directory can provide custom IPI operations.

Signed-off-by: Anup Patel <[email protected]>
Tested-by: Emil Renner Berhing <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

Merge tag 'dma-mapping-5.9-1' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:
"Fix more fallout from the dma-pool changes (Nicolas Saenz Julienne,
  me)"

* tag 'dma-mapping-5.9-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-pool: Only allocate from CMA when in same memory zone
  dma-pool: fix coherent pool allocations for IOMMU mappings

afs: Fix key ref leak in afs_put_operation()

The afs_put_operation() function needs to put the reference to the key
that's authenticating the operation.

Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Reported-by: Dave Botsch <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull operating performance points (OPP) framework fixes for 5.9-rc2
from Viresh Kumar:

"This contains the following fixes for 5.9:

- Fix re-enabling of resources (Rajendra Nayak).

- Put OPP table references (Stephen Boyd)."

* 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  opp: Enable resources again if they were disabled earlier
  opp: Put opp table in dev_pm_opp_set_rate() if _set_opp_bw() fails
  opp: Put opp table in dev_pm_opp_set_rate() for empty tables

libbpf: Fix map index used in error message

The error message emitted by bpf_object__init_user_btf_maps() was using the
wrong section ID.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

powerpc/pseries: Do not initiate shutdown when system is running on UPS

As per PAPR we have to look for both EPOW sensor value and event
modifier to identify the type of event and take appropriate action.

In LoPAPR v1.1 section 10.2.2 includes table 136 "EPOW Action Codes":

  SYSTEM_SHUTDOWN 3

  The system must be shut down. An EPOW-aware OS logs the EPOW error
  log information, then schedules the system to be shut down to begin
  after an OS defined delay internal (default is 10 minutes.)

Then in section 10.3.2.2.8 there is table 146 "Platform Event Log
Format, Version 6, EPOW Section", which includes the "EPOW Event
Modifier":

  For EPOW sensor value = 3
  0x01 = Normal system shutdown with no additional delay
  0x02 = Loss of utility power, system is running on UPS/Battery
  0x03 = Loss of system critical functions, system should be shutdown
  0x04 = Ambient temperature too high
  All other values = reserved

We have a user space tool (rtas_errd) on LPAR to monitor for
EPOW_SHUTDOWN_ON_UPS. Once it gets an event it initiates shutdown
after predefined time. It also starts monitoring for any new EPOW
events. If it receives "Power restored" event before predefined time
it will cancel the shutdown. Otherwise after predefined time it will
shutdown the system.

Commit 79872e35469b ("powerpc/pseries: All events of
EPOW_SYSTEM_SHUTDOWN must initiate shutdown") changed our handling of
the "on UPS/Battery" case, to immediately shutdown the system. This
breaks existing setups that rely on the userspace tool to delay
shutdown and let the system run on the UPS.

Fixes: 79872e35469b ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown")
Cc: [email protected] # v4.0+
Signed-off-by: Vasant Hegde <[email protected]>
[mpe: Massage change log and add PAPR references]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

io_uring: kill extra iovec=NULL in import_iovec()

If io_import_iovec() returns an error, return iovec is undefined and
must not be used, so don't set it to NULL when failing.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: comment on kfree(iovec) checks

kfree() handles NULL pointers well, but io_{read,write}() checks it
because of performance reasons. Leave a comment there for those who are
tempted to patch it.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: fix racy req->flags modification

Setting and clearing REQ_F_OVERFLOW in io_uring_cancel_files() and
io_cqring_overflow_flush() are racy, because they might be called
asynchronously.

REQ_F_OVERFLOW flag in only needed for files cancellation, so if it can
be guaranteed that requests _currently_ marked inflight can't be
overflown, the problem will be solved with removing the flag
altogether.

That's how the patch works, it removes inflight status of a request
in io_cqring_fill_event() whenever it should be thrown into CQ-overflow
list. That's Ok to do, because no opcode specific handling can be done
after io_cqring_fill_event(), the same assumption as with "struct
io_completion" patches.
And it already have a good place for such cleanups, which is
io_clean_op(). A nice side effect of this is removing this inflight
check from the hot path.

note on synchronisation: now __io_cqring_fill_event() may be taking two
spinlocks simultaneously, completion_lock and inflight_lock. It's fine,
because we never do that in reverse order, and CQ-overflow of inflight
requests shouldn't happen often.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Revert "RDMA/hns: Reserve one sge in order to avoid local length error"

This patch caused some issues on SEND operation, and it should be reverted
to make the drivers work correctly. There will be a better solution that
has been tested carefully to solve the original problem.

This reverts commit 711195e57d341e58133d92cf8aaab1db24e4768d.

Fixes: 711195e57d34 ("RDMA/hns: Reserve one sge in order to avoid local length error")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/hfi1: Correct an interlock issue for TID RDMA WRITE request

The following message occurs when running an AI application with TID RDMA
enabled:

hfi1 0000:7f:00.0: hfi1_0: [QP74] hfi1_tid_timeout 4084
hfi1 0000:7f:00.0: hfi1_0: [QP70] hfi1_tid_timeout 4084

The issue happens when TID RDMA WRITE request is followed by an
IB_WR_RDMA_WRITE_WITH_IMM request, the latter could be completed first on
the responder side. As a result, no ACK packet for the latter could be
sent because the TID RDMA WRITE request is still being processed on the
responder side.

When the TID RDMA WRITE request is eventually completed, the requester
will wait for the IB_WR_RDMA_WRITE_WITH_IMM request to be acknowledged.

If the next request is another TID RDMA WRITE request, no TID RDMA WRITE
DATA packet could be sent because the preceding IB_WR_RDMA_WRITE_WITH_IMM
request is not completed yet.

Consequently the IB_WR_RDMA_WRITE_WITH_IMM will be retried but it will be
ignored on the responder side because the responder thinks it has already
been completed. Eventually the retry will be exhausted and the qp will be
put into error state on the requester side. On the responder side, the TID
resource timer will eventually expire because no TID RDMA WRITE DATA
packets will be received for the second TID RDMA WRITE request. There is
also risk of a write-after-write memory corruption due to the issue.

Fix by adding a requester side interlock to prevent any potential data
corruption and TID RDMA protocol error.

Fixes: a0b34f75ec20 ("IB/hfi1: Add interlock between a TID RDMA request and other requests")
Link: https://lore.kernel.org/r/[email protected]
Cc: <[email protected]> # 5.4.x+
Reviewed-by: Mike Marciniszyn <[email protected]>
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Kaike Wan <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/bnxt_re: Do not add user qps to flushlist

Driver shall add only the kernel qps to the flush list for clean up.
During async error events from the HW, driver is adding qps to this list
without checking if the qp is kernel qp or not.

Add a check to avoid user qp addition to the flush list.

Fixes: 942c9b6ca8de ("RDMA/bnxt_re: Avoid Hard lockup during error CQE processing")
Fixes: c50866e2853a ("bnxt_re: fix the regression due to changes in alloc_pbl")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/core: Fix spelling mistake "Could't" -> "Couldn't"

There is a spelling mistake in a pr_warn message. Fix it.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

powerpc/perf: Fix soft lockups due to missed interrupt accounting

Performance monitor interrupt handler checks if any counter has
overflown and calls record_and_restart() in core-book3s which invokes
perf_event_overflow() to record the sample information. Apart from
creating sample, perf_event_overflow() also does the interrupt and
period checks via perf_event_account_interrupt().

Currently we record information only if the SIAR (Sampled Instruction
Address Register) valid bit is set (using siar_valid() check) and
hence the interrupt check.

But it is possible that we do sampling for some events that are not
generating valid SIAR, and hence there is no chance to disable the
event if interrupts are more than max_samples_per_tick. This leads to
soft lockup.

Fix this by adding perf_event_account_interrupt() in the invalid SIAR
code path for a sampling event. ie if SIAR is invalid, just do
interrupt check and don't record the sample information.

Reported-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Documentation: efi: remove description of efi=old_map

The old EFI runtime region mapping logic that was kept around for some
time has finally been removed entirely, along with the SGI UV1 support
code that was its last remaining user. So remove any mention of the
efi=old_map command line parameter from the docs.

Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Signed-off-by: Ard Biesheuvel <[email protected]>

efi/x86: Move 32-bit code into efi_32.c

Now that the old memmap code has been removed, some code that was left
behind in arch/x86/platform/efi/efi.c is only used for 32-bit builds,
which means it can live in efi_32.c as well. So move it over.

Signed-off-by: Ard Biesheuvel <[email protected]>

efi/libstub: Handle unterminated cmdline

Make the command line parsing more robust, by handling the case it is
not NUL-terminated.

Use strnlen instead of strlen, and make sure that the temporary copy is
NUL-terminated before parsing.

Cc: <[email protected]>
Signed-off-by: Arvind Sankar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ard Biesheuvel <[email protected]>

efi/libstub: Handle NULL cmdline

Treat a NULL cmdline the same as empty. Although this is unlikely to
happen in practice, the x86 kernel entry does check for NULL cmdline and
handles it, so do it here as well.

Cc: <[email protected]>
Signed-off-by: Arvind Sankar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ard Biesheuvel <[email protected]>

efi/libstub: Stop parsing arguments at "--"

Arguments after "--" are arguments for init, not for the kernel.

Cc: <[email protected]>
Signed-off-by: Arvind Sankar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ard Biesheuvel <[email protected]>

efi: add missed destroy_workqueue when efisubsys_init fails

destroy_workqueue() should be called to destroy efi_rts_wq
when efisubsys_init() init resources fails.

Cc: <[email protected]>
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Li Heng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ard Biesheuvel <[email protected]>

efi/x86: Mark kernel rodata non-executable for mixed mode

When remapping the kernel rodata section RO in the EFI pagetables, the
protection flags that were used for the text section are being reused,
but the rodata section should not be marked executable.

Cc: <[email protected]>
Signed-off-by: Arvind Sankar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ard Biesheuvel <[email protected]>

opp: Enable resources again if they were disabled earlier

dev_pm_opp_set_rate() can now be called with freq = 0 in order
to either drop performance or bandwidth votes or to disable
regulators on platforms which support them.

In such cases, a subsequent call to dev_pm_opp_set_rate() with
the same frequency ends up returning early because 'old_freq == freq'

Instead make it fall through and put back the dropped performance
and bandwidth votes and/or enable back the regulators.

Cc: v5.3+ <[email protected]> # v5.3+
Fixes: cd7ea582866f ("opp: Make dev_pm_opp_set_rate() handle freq = 0 to drop performance votes")
Reported-by: Sajida Bhanu <[email protected]>
Reviewed-by: Sibi Sankar <[email protected]>
Reported-by: Matthias Kaehlcke <[email protected]>
Tested-by: Matthias Kaehlcke <[email protected]>
Reviewed-by: Stephen Boyd <[email protected]>
Signed-off-by: Rajendra Nayak <[email protected]>
[ Viresh: Don't skip clk_set_rate() and massaged changelog ]
Signed-off-by: Viresh Kumar <[email protected]>

Fix build error when CONFIG_ACPI is not set/enabled:

../arch/x86/pci/xen.c: In function ‘pci_xen_init’:
../arch/x86/pci/xen.c:410:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration]
acpi_noirq_set();

Fixes: 88e9ca161c13 ("xen/pci: Use acpi_noirq_set() helper to avoid #ifdef")
Signed-off-by: Randy Dunlap <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Juergen Gross <[email protected]>

efi: avoid error message when booting under Xen

efifb_probe() will issue an error message in case the kernel is booted
as Xen dom0 from UEFI as EFI_MEMMAP won't be set in this case. Avoid
that message by calling efi_mem_desc_lookup() only if EFI_MEMMAP is set.

Fixes: 38ac0287b7f4 ("fbdev/efifb: Honour UEFI memory map attributes when mapping the FB")
Signed-off-by: Juergen Gross <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Acked-by: Bartlomiej Zolnierkiewicz <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

Merge tag 'vfio-v5.9-rc2' of git://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

- Fix lockdep issue reported for recursive read-lock (Alex Williamson)

- Fix missing unwind in type1 replay function (Alex Williamson)

* tag 'vfio-v5.9-rc2' of git://github.com/awilliam/linux-vfio:
vfio/type1: Add proper error unwind for vfio_iommu_replay()
vfio-pci: Avoid recursive read-lock usage

Revert "drm/amdgpu: disable gfxoff for navy_flounder"

This reverts commit 9c9b17a7d19a8e21db2e378784fff1128b46c9d3.
Newly released sdma fw (51.52) provides a fix for the issue.

Signed-off-by: Jiansong Chen <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

powerpc/powernv/pci: Fix possible crash when releasing DMA resources

Fix a typo introduced during recent code cleanup, which could lead to
silently not freeing resources or an oops message (on PCI hotplug or
CAPI reset).

Only impacts ioda2, the code path for ioda1 is correct.

Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA setup state")
Signed-off-by: Frederic Barrat <[email protected]>
Reviewed-by: Oliver O'Halloran <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe()

Replace alloc_etherdev_mq with devm_alloc_etherdev_mqs. In this way,
when probe fails, netdev can be freed automatically.

Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wang Hai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: atlantic: Use readx_poll_timeout() for large timeout

Commit
8dcf2ad39fdb2 ("net: atlantic: add hwmon getter for MAC temperature")

implemented a read callback with an udelay(10000U). This fails to
compile on ARM because the delay is >1ms. I doubt that it is needed to
spin for 10ms even if possible on x86.

>From looking at the code, the context appears to be preemptible so using
usleep() should work and avoid busy spinning.

Use readx_poll_timeout() in the poll loop.

Fixes: 8dcf2ad39fdb2 ("net: atlantic: add hwmon getter for MAC temperature")
Cc: Mark Starovoytov <[email protected]>
Cc: Igor Russkikh <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ptp: ptp_clockmatrix: use i2c_master_send for i2c write

The old code for i2c write would break on some controllers, which fails
at handling Repeated Start Condition. So we will just use i2c_master_send
to handle write in one transanction.

Changes since v1:
- Remove indentation change

Signed-off-by: Min Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netlink: fix state reallocation in policy export

Evidently, when I did this previously, we didn't have more than
10 policies and didn't run into the reallocation path, because
it's missing a memset() for the unused policies. Fix that.

Fixes: d07dcf9aadd6 ("netlink: add infrastructure to expose policies to userspace")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'Bug-fixes-for-ENA-ethernet-driver'

Shay Agroskin says:

====================
Bug fixes for ENA ethernet driver

This series adds the following:
- Fix undesired call to ena_restore after returning from suspend
- Fix condition inside a WARN_ON
- Fix overriding previous value when updating missed_tx statistic

v1->v2:
- fix bug when calling reset routine after device resources are freed (Jakub)

v2->v3:
- fix wrong hash in 'Fixes' tag
====================

Signed-off-by: David S. Miller <[email protected]>

net: ena: Make missed_tx stat incremental

Most statistics in ena driver are incremented, meaning that a stat's
value is a sum of all increases done to it since driver/queue
initialization.

This patch makes all statistics this way, effectively making missed_tx
statistic incremental.
Also added a comment regarding rx_drops and tx_drops to make it
clearer how these counters are calculated.

Fixes: 11095fdb712b ("net: ena: add statistics for missed tx packets")
Signed-off-by: Shay Agroskin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ena: Change WARN_ON expression in ena_del_napi_in_range()

The ena_del_napi_in_range() function unregisters the napi handler for
rings in a given range.
This function had the following WARN_ON macro:

    WARN_ON(ENA_IS_XDP_INDEX(adapter, i) &&
    adapter->ena_napi[i].xdp_ring);

This macro prints the call stack if the expression inside of it is
true [1], but the expression inside of it is the wanted situation.
The expression checks whether the ring has an XDP queue and its index
corresponds to a XDP one.

This patch changes the expression to
    !ENA_IS_XDP_INDEX(adapter, i) && adapter->ena_napi[i].xdp_ring
which indicates an unwanted situation.

Also, change the structure of the function. The napi handler is
unregistered for all rings, and so there's no need to check whether the
index is an XDP index or not. By removing this check the code becomes
much more readable.

Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ena: Prevent reset after device destruction

The reset work is scheduled by the timer routine whenever it
detects that a device reset is required (e.g. when a keep_alive signal
is missing).
When releasing device resources in ena_destroy_device() the driver
cancels the scheduling of the timer routine without destroying the reset
work explicitly.

This creates the following bug:
    The driver is suspended and the ena_suspend() function is called
-> This function calls ena_destroy_device() to free the net device
   resources
    -> The driver waits for the timer routine to finish
    its execution and then cancels it, thus preventing from it
    to be called again.

    If, in its final execution, the timer routine schedules a reset,
    the reset routine might be called afterwards,and a redundant call to
    ena_restore_device() would be made.

By changing the reset routine we allow it to read the device's state
accurately.
This is achieved by checking whether ENA_FLAG_TRIGGER_RESET flag is set
before resetting the device and making both the destruction function and
the flag check are under rtnl lock.
The ENA_FLAG_TRIGGER_RESET is cleared at the end of the destruction
routine. Also surround the flag check with 'likely' because
we expect that the reset routine would be called only when
ENA_FLAG_TRIGGER_RESET flag is set.

The destruction of the timer and reset services in __ena_shutoff() have to
stay, even though the timer routine is destroyed in ena_destroy_device().
This is to avoid a case in which the reset routine is scheduled after
free_netdev() in __ena_shutoff(), which would create an access to freed
memory in adapter->flags.

Fixes: 8c5c7abdeb2d ("net: ena: add power management ops to the ENA driver")
Signed-off-by: Shay Agroskin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

of: address: Work around missing device_type property in pcie nodes

Recent changes to the DT PCI bus parsing made it mandatory for
device tree nodes describing a PCI controller to have the
'device_type = "pci"' property for the node to be matched.

Although this follows the letter of the specification, it
breaks existing device-trees that have been working fine
for years. Rockchip rk3399-based systems are a prime example
of such collateral damage, and have stopped discovering their
PCI bus.

In order to paper over it, let's add a workaround to the code
matching the device type, and accept as PCI any node that is
named "pcie",

A warning will hopefully nudge the user into updating their
DT to a fixed version if they can, but the incentive is
obviously pretty small.

Fixes: 2f96593ecc37 ("of_address: Add bus type match for pci ranges parser")
Suggested-by: Rob Herring <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

dt: writing-schema: Miscellaneous grammar fixes

- Add missing verb,
- Fix accidental plural.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

lib/string.c: Use freestanding environment

gcc can transform the loop in a naive implementation of memset/memcpy
etc into a call to the function itself. This optimization is enabled by
-ftree-loop-distribute-patterns.

This has been the case for a while, but gcc-10.x enables this option at
-O2 rather than -O3 as in previous versions.

Add -ffreestanding, which implicitly disables this optimization with
gcc. It is unclear whether clang performs such optimizations, but
hopefully it will also not do so in a freestanding environment.

Signed-off-by: Arvind Sankar <[email protected]>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
Signed-off-by: Linus Torvalds <[email protected]>

x86/boot/compressed: Use builtin mem functions for decompressor

Since commits

  c041b5ad8640 ("x86, boot: Create a separate string.h file to provide standard string functions")
  fb4cac573ef6 ("x86, boot: Move memcmp() into string.h and string.c")

the decompressor stub has been using the compiler's builtin memcpy,
memset and memcmp functions, _except_ where it would likely have the
largest impact, in the decompression code itself.

Remove the #undef's of memcpy and memset in misc.c so that the
decompressor code also uses the compiler builtins.

The rationale given in the comment doesn't really apply: just because
some functions use the out-of-line version is no reason to not use the
builtin version in the rest.

Replace the comment with an explanation of why memzero and memmove are
being #define'd.

Drop the suggestion to #undef in boot/string.h as well: the out-of-line
versions are not really optimized versions, they're generic code that's
good enough for the preboot environment. The compiler will likely
generate better code for constant-size memcpy/memset/memcmp if it is
allowed to.

Most decompressors' performance is unchanged, with the exception of LZ4
and 64-bit ZSTD.

Before After ARCH
LZ4   73ms 10ms   32
LZ4 120ms 10ms 64
ZSTD   90ms 74ms 64

Measurements on QEMU on 2.2GHz Broadwell Xeon, using defconfig kernels.

Decompressor code size has small differences, with the largest being
that 64-bit ZSTD decreases just over 2k. The largest code size increase
was on 64-bit XZ, of about 400 bytes.

Signed-off-by: Arvind Sankar <[email protected]>
Suggested-by: Nick Terrell <[email protected]>
Tested-by: Nick Terrell <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

io_uring: use system_unbound_wq for ring exit work

We currently use system_wq, which is unbounded in terms of number of
workers. This means that if we're exiting tons of rings at the same
time, then we'll briefly spawn tons of event kworkers just for a very
short blocking time as the rings exit.

Use system_unbound_wq instead, which has a sane cap on the concurrency
level.

Signed-off-by: Jens Axboe <[email protected]>

ext4: limit the length of per-inode prealloc list

In the scenario of writing sparse files, the per-inode prealloc list may
be very long, resulting in high overhead for ext4_mb_use_preallocated().
To circumvent this problem, we limit the maximum length of per-inode
prealloc list to 512 and allow users to modify it.

After patching, we observed that the sys ratio of cpu has dropped, and
the system throughput has increased significantly. We created a process
to write the sparse file, and the running time of the process on the
fixed kernel was significantly reduced, as follows:

Running time on unfixed kernel：
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m2.051s
user    0m0.008s
sys     0m2.026s

Running time on fixed kernel：
[root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
real    0m0.471s
user    0m0.004s
sys     0m0.395s

Signed-off-by: Chunguang Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: reorganize if statement of ext4_mb_release_context()

Reorganize the if statement of ext4_mb_release_context(), make it
easier to read.

Signed-off-by: Chunguang Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Ritesh Harjani <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: add mb_debug logging when there are lost chunks

Lost chunks are when some other process raced with the current thread
to grab a particular block allocation. Add mb_debug log for
developers who wants to see how often this is happening for a
particular workload.

Signed-off-by: Chunguang Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: Fix comment typo "the the".

I have found double typed comments "the the". So i modified it to
one "the"

Signed-off-by: kyoungho koo <[email protected]>
Link: https://lore.kernel.org/r/20200424171620.GA11943@koo-Z370-HD3
Signed-off-by: Theodore Ts'o <[email protected]>

jbd2: clean up checksum verification in do_one_pass()

Remove the unnecessary chksum_err and checksum_seen variables as well as
some redundant code to make the function easier to understand.

[ With changes suggested by jack@ and tytso@ ]

Signed-off-by: Shijie Luo <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ALSA: hda: avoid reset of sdo_limit

By default 'sdo_limit' is initialized with a default value of '8'
as per spec. This is overridden in cases where a different value is
required. However this is getting reset when snd_hdac_bus_init_chip()
is called again, which happens during runtime PM cycle.

Avoid this reset by moving 'sdo_limit' setup to 'snd_hdac_bus_init()'
function which would be called only once.

Fixes: 67ae482a59e9 ("ALSA: hda: add member to store ratio for stripe control")
Cc: <[email protected]>
Signed-off-by: Sameer Pujar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

drm/i915/tgl: Make sure TC-cold is blocked before enabling TC AUX power wells

The dependency between power wells is determined by the ordering of the
power well list: when enabling the power wells for a domain, this
happens walking the power well list forward, while disabling them
happens in the reverse direction. Accordingly a power well on the list
must follow any other power well it depends on.

Since the TC AUX power wells depend on TC-cold being blocked, move the
TC-cold off power well before all AUX power wells.

Fixes: 3c02934b24e3 ("drm/i915/tc/tgl: Implement TC cold sequences")
Cc: José Roberto de Souza <[email protected]>
Signed-off-by: Imre Deak <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit b302a2e68807604af2a5015816c1d117747989b6)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/selftests: Avoid passing a random 0 into ilog2

igt_mm_config() calls ilog2() on the (pseudo)random 21-bit number
s>>12. Once in 2 million seeds, this is zero and ilog2 summons
the nasal demons.

There was an attempt to handle this case with a max(), but that's
too late; ms could already be something bizarre.

Given that the low 12 bits of s and ms are always zero, it's a lot
simpler just to divide them by 4096, then everything fits into 32
bits, and we can easily generate a random number 1 <= s <= 0x1fffff.

Fixes: 14d1b9a6247c ("drm/i915: buddy allocator")
Signed-off-by: George Spelvin <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: [email protected]
Reviewed-by: Matthew Auld <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit 21118e8e56479ef33460fbd63a5ad0535843b666)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Fix wrong return value in intel_atomic_check()

In the case of calling check_digital_port_conflicts() failed, a
negative error code -EINVAL should be returned.

Fixes: bf5da83e4bd80 ("drm/i915: Move check_digital_port_conflicts() earier")
Cc: Ville Syrjälä <[email protected]>
Signed-off-by: Tianjia Zhang <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
Signed-off-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit 66b51b801d05ee54a0f23628cb8220189adb715e)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Update bw_buddy pagemask table

A recent bspec update removed the LPDDR4 single channel entry from the
buddy register table, but added a new four-channel entry.

Workaround 1409767108 hasn't been updated with any guidance for four
channel configurations, so we leave that alternate table unchanged for
now.

Bspec 49218
Fixes: 3fa01d642fa7 ("drm/i915/tgl: Program BW_BUDDY registers during display init")
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Lucas De Marchi <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit ecb40d0826fda213ebb58d49e7d5b4752480e130)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/display: Check for an LPSP encoder before dereferencing

Avoid a GPF at

<1>[   20.177320] BUG: kernel NULL pointer dereference, address: 000000000000007c
<1>[   20.177322] #PF: supervisor read access in kernel mode
<1>[   20.177323] #PF: error_code(0x0000) - not-present page
<6>[   20.177324] PGD 0 P4D 0
<4>[   20.177327] Oops: 0000 [#1] PREEMPT SMP PTI
<4>[   20.177328] CPU: 1 PID: 944 Comm: debugfs_test Not tainted 5.8.0-rc7-CI-CI_DRM_8814+ #1
<4>[   20.177330] Hardware name: Dell Inc. XPS 13 9360/0823VW, BIOS 2.9.0 07/09/2018
<4>[   20.177372] RIP: 0010:i915_lpsp_capability_show+0x44/0xc0 [i915]
<4>[   20.177374] Code: 0f b6 81 ca 0d 00 00 3c 0b 74 77 76 19 3c 0c 75 44 83 7e 7c 01 7e 2f 48 c7 c6 d7 b9 47 a0 e8 43 df 06 e1 31 c0 c3 3c 09 72 2b <8b> 46 7c 85 c0 75 e6 8b 82 e4 00 00 00 89 c2 83 e2 fb 83 fa 0a 74
<4>[   20.177376] RSP: 0018:ffffc90000cebe38 EFLAGS: 00010246
<4>[   20.177377] RAX: 0000000000000009 RBX: ffff888267fe6a58 RCX: ffff888252d10000
<4>[   20.177378] RDX: ffff88824a9a4000 RSI: 0000000000000000 RDI: ffff888267fe6a30
<4>[   20.177379] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
<4>[   20.177380] R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000cebf08
<4>[   20.177381] R13: 00000000ffffffff R14: 0000000000000001 R15: ffff888267fe6a30
<4>[   20.177383] FS:  00007f6f9c6b5e40(0000) GS:ffff888276480000(0000) knlGS:0000000000000000
<4>[   20.177384] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[   20.177385] CR2: 000000000000007c CR3: 0000000255f04006 CR4: 00000000003606e0
<4>[   20.177386] Call Trace:
<4>[   20.177390]  seq_read+0xcb/0x420

which is presumably from having no encoder attached at that time.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2175
Fixes: 8806211fe7b3 ("drm/i915: Add i915_lpsp_capability debugfs")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Animesh Manna <[email protected]>
Cc: Anshuman Gupta <[email protected]>
Cc: Uma Shankar <[email protected]>
Cc: "Ville Syrjälä" <[email protected]>
Reviewed-by: Anshuman Gupta <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit a22b1a9bb0d72a58d5b836653f28d97ee8fea1c4)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Copy default modparams to mock i915_device

Since we use the module parameters stored inside the drm_i915_device
itself, we need to ensure the mock i915_device also sets up the right
defaults.

Fixes: 8a25c4be583d ("drm/i915/params: switch to device specific parameters")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Jani Nikula <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit 98ef067453709444a264939940f7b3a5dfdfa09e)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Provide the perf pmu.module

Rather than manually implement our own module reference counting for perf
pmu events, finally realise that there is a module parameter to struct
pmu for this very purpose.

Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: [email protected]
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit 27e897beec1c59861f15d4d3562c39ad1143620f)
Signed-off-by: Jani Nikula <[email protected]>

Merge tag 'gvt-next-fixes-2020-08-05' of https://github.com/intel/gvt-linux into drm-intel-fixes

gvt-next-fixes-2020-08-05

- Fix guest suspend/resume low performance handling of shadow ppgtt (Colin)
- Fix PV notifier handling for guest suspend/resume (Colin)

Signed-off-by: Jani Nikula <[email protected]>
From: Zhenyu Wang <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

ALSA: hda/realtek: Add quirk for Samsung Galaxy Book Ion

The Galaxy Book Ion uses the same ALC298 codec as other Samsung laptops
which have the no headphone sound bug, like my Samsung Notebook. The
Galaxy Book owner confirmed that this patch fixes the bug.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207423
Signed-off-by: Mike Pozulp <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

Merge tag 'asoc-fix-v5.9-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.9

A bunch of fixes that came in during the merge window, mostly for issues
that were uncovered by the changes to report errors on invalid register
access plus one important fix in that code itself.

Merge tag 'amd-drm-fixes-5.9-2020-08-12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

amd-drm-fixes-5.9-2020-08-12:

amdgpu:
- Fix allocation size
- SR-IOV fixes
- Vega20 SMU feature state caching fix
- Fix custom pptable handling
- Arcturus golden settings update
- Several display fixes

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-misc-fixes-2020-08-12' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v5.9-rc1:
- Add missing dma_fence_put() in virtio_gpu_execbuffer_ioctl().
- Fix memory leak in virtio_gpu_cleanup_object().

Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

bpftool: Handle EAGAIN error code properly in pids collection

When the error code is EAGAIN, the kernel signals the user
space should retry the read() operation for bpf iterators.
Let us do it.

Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Avoid visit same object multiple times

Currently when traversing all tasks, the next tid
is always increased by one. This may result in
visiting the same task multiple times in a
pid namespace.

This patch fixed the issue by seting the next
tid as pid_nr_ns(pid, ns) + 1, similar to
funciton next_tgid().

Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Cc: Rik van Riel <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Fix a rcu_sched stall issue with bpf task/task_file iterator

In our production system, we observed rcu stalls when
'bpftool prog` is running.
  rcu: INFO: rcu_sched self-detected stall on CPU
  rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913
  \x09(t=21031 jiffies g=2534773 q=179750)
  NMI backtrace for cpu 7
  CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G        W         5.8.0-00004-g68bfc7f8c1b4 #6
  Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
  Call Trace:
  <IRQ>
  dump_stack+0x57/0x70
  nmi_cpu_backtrace.cold+0x14/0x53
  ? lapic_can_unplug_cpu.cold+0x39/0x39
  nmi_trigger_cpumask_backtrace+0xb7/0xc7
  rcu_dump_cpu_stacks+0xa2/0xd0
  rcu_sched_clock_irq.cold+0x1ff/0x3d9
  ? tick_nohz_handler+0x100/0x100
  update_process_times+0x5b/0x90
  tick_sched_timer+0x5e/0xf0
  __hrtimer_run_queues+0x12a/0x2a0
  hrtimer_interrupt+0x10e/0x280
  __sysvec_apic_timer_interrupt+0x51/0xe0
  asm_call_on_stack+0xf/0x20
  </IRQ>
  sysvec_apic_timer_interrupt+0x6f/0x80
  asm_sysvec_apic_timer_interrupt+0x12/0x20
  RIP: 0010:task_file_seq_get_next+0x71/0x220
  Code: 00 00 8b 53 1c 49 8b 7d 00 89 d6 48 8b 47 20 44 8b 18 41 39 d3 76 75 48 8b 4f 20 8b 01 39 d0 76 61 41 89 d1 49 39 c1 48 19 c0 <48> 8b 49 08 21 d0 48 8d 04 c1 4c 8b 08 4d 85 c9 74 46 49 8b 41 38
  RSP: 0018:ffffc90006223e10 EFLAGS: 00000297
  RAX: ffffffffffffffff RBX: ffff888f0d172388 RCX: ffff888c8c07c1c0
  RDX: 00000000000f017b RSI: 00000000000f017b RDI: ffff888c254702c0
  RBP: ffffc90006223e68 R08: ffff888be2a1c140 R09: 00000000000f017b
  R10: 0000000000000002 R11: 0000000000100000 R12: ffff888f23c24118
  R13: ffffc90006223e60 R14: ffffffff828509a0 R15: 00000000ffffffff
  task_file_seq_next+0x52/0xa0
  bpf_seq_read+0xb9/0x320
  vfs_read+0x9d/0x180
  ksys_read+0x5f/0xe0
  do_syscall_64+0x38/0x60
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f8815f4f76e
  Code: c0 e9 f6 fe ff ff 55 48 8d 3d 76 70 0a 00 48 89 e5 e8 36 06 02 00 66 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 52 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5
  RSP: 002b:00007fff8f9df578 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
  RAX: ffffffffffffffda RBX: 000000000170b9c0 RCX: 00007f8815f4f76e
  RDX: 0000000000001000 RSI: 00007fff8f9df5b0 RDI: 0000000000000007
  RBP: 00007fff8f9e05f0 R08: 0000000000000049 R09: 0000000000000010
  R10: 00007f881601fa40 R11: 0000000000000246 R12: 00007fff8f9e05a8
  R13: 00007fff8f9e05a8 R14: 0000000001917f90 R15: 000000000000e22e

Note that `bpftool prog` actually calls a task_file bpf iterator
program to establish an association between prog/map/link/btf anon
files and processes.

In the case where the above rcu stall occured, we had a process
having 1587 tasks and each task having roughly 81305 files.
This implied 129 million bpf prog invocations. Unfortunwtely none of
these files are prog/map/link/btf files so bpf iterator/prog needs
to traverse all these files and not able to return to user space
since there are no seq_file buffer overflow.

This patch fixed the issue in bpf_seq_read() to limit the number
of visited objects. If the maximum number of visited objects is
reached, no more objects will be visited in the current syscall.
If there is nothing written in the seq_file buffer, -EAGAIN will
return to the user so user can try again.

The maximum number of visited objects is set at 1 million.
In our Intel Xeon D-2191 2.3GHZ 18-core server, bpf_seq_read()
visiting 1 million files takes around 0.18 seconds.

We did not use cond_resched() since for some iterators, e.g.,
netlink iterator, where rcu read_lock critical section spans between
consecutive seq_ops->next(), which makes impossible to do cond_resched()
in the key while loop of function bpf_seq_read().

Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

net: ipv4: remove duplicate "the the" phrase in Kconfig text

The Kconfig help text contains the phrase "the the" in the help
text. Fix this and reformat the block of help text.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mscc: ocelot: remove duplicate "the the" phrase in Kconfig text

The Kconfig help text contains the phrase "the the" in the help
text. Fix this.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ethtool-netlink-bug-fixes'

Maxim Mikityanskiy says:

====================
ethtool-netlink bug fixes

This series contains a few bug fixes for ethtool-netlink. These bugs are
specific for the netlink interface, and the legacy ioctl interface is
not affected. These patches aim to have the same behavior in
ethtool-netlink as in the legacy ethtool.

Please also see the sibling series for the userspace tool.

v2 changes: Added Fixes tags.
====================

Signed-off-by: David S. Miller <[email protected]>

ethtool: Don't omit the netlink reply if no features were changed

The legacy ethtool userspace tool shows an error when no features could
be changed. It's useful to have a netlink reply to be able to show this
error when __netdev_update_features wasn't called, for example:

1. ethtool -k eth0
   large-receive-offload: off
2. ethtool -K eth0 rx-fcs on
3. ethtool -K eth0 lro on
   Could not change any device features
   rx-lro: off [requested on]
4. ethtool -K eth0 lro on
   # The output should be the same, but without this patch the kernel
   # doesn't send the reply, and ethtool is unable to detect the error.

This commit makes ethtool-netlink always return a reply when requested,
and it still avoids unnecessary calls to __netdev_update_features if the
wanted features haven't changed.

Fixes: 0980bfcd6954 ("ethtool: set netdev features with FEATURES_SET request")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Michal Kubecek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ethtool: Account for hw_features in netlink interface

ethtool-netlink ignores dev->hw_features and may confuse the drivers by
asking them to enable features not in the hw_features bitmask. For
example:

1. ethtool -k eth0
   tls-hw-tx-offload: off [fixed]
2. ethtool -K eth0 tls-hw-tx-offload on
   tls-hw-tx-offload: on
3. ethtool -k eth0
   tls-hw-tx-offload: on [fixed]

Fitler out dev->hw_features from req_wanted to fix it and to resemble
the legacy ethtool behavior.

Fixes: 0980bfcd6954 ("ethtool: set netdev features with FEATURES_SET request")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Michal Kubecek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ethtool: Fix preserving of wanted feature bits in netlink interface

Currently, ethtool-netlink calculates new wanted bits as:
(req_wanted & req_mask) | (old_active & ~req_mask)

It completely discards the old wanted bits, so they are forgotten with
the next ethtool command. Sample steps to reproduce:

1. ethtool -k eth0
   tx-tcp-segmentation: on # TSO is on from the beginning
2. ethtool -K eth0 tx off
   tx-tcp-segmentation: off [not requested]
3. ethtool -k eth0
   tx-tcp-segmentation: off [requested on]
4. ethtool -K eth0 rx off # Some change unrelated to TSO
5. ethtool -k eth0
   tx-tcp-segmentation: off # "Wanted on" is forgotten

This commit fixes it by changing the formula to:
(req_wanted & req_mask) | (old_wanted & ~req_mask),
where old_active was replaced by old_wanted to account for the wanted
bits.

The shortcut condition for the case where nothing was changed now
compares wanted bitmasks, instead of wanted to active.

Fixes: 0980bfcd6954 ("ethtool: set netdev features with FEATURES_SET request")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Michal Kubecek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: some fixes for ipv6_dev_find()

This patch is to do 3 things for ipv6_dev_find():

  As David A. noticed,

  - rt6_lookup() is not really needed. Different from __ip_dev_find(),
    ipv6_dev_find() doesn't have a compatibility problem, so remove it.

  As Hideaki suggested,

  - "valid" (non-tentative) check for the address is also needed.
    ipv6_chk_addr() calls ipv6_chk_addr_and_flags(), which will
    traverse the address hash list, but it's heavy to be called
    inside ipv6_dev_find(). This patch is to reuse the code of
    ipv6_chk_addr_and_flags() for ipv6_dev_find().

  - dev parameter is passed into ipv6_dev_find(), as link-local
    addresses from user space has sin6_scope_id set and the dev
    lookup needs it.

Fixes: 81f6cb31222d ("ipv6: add ipv6_dev_find()")
Suggested-by: YOSHIFUJI Hideaki <[email protected]>
Reported-by: David Ahern <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bonding: fix active-backup failover for current ARP slave

When the ARP monitor is used for link detection, ARP replies are
validated for all slaves (arp_validate=3) and fail_over_mac is set to
active, two slaves of an active-backup bond may get stuck in a state
where both of them are active and pass packets that they receive to
the bond. This state makes IPv6 duplicate address detection fail. The
state is reached thus:
1. The current active slave goes down because the ARP target
is not reachable.
2. The current ARP slave is chosen and made active.
3. A new slave is enslaved. This new slave becomes the current active
slave and can reach the ARP target.
As a result, the current ARP slave stays active after the enslave
action has finished and the log is littered with "PROBE BAD" messages:
> bond0: PROBE: c_arp ens10 && cas ens11 BAD
The workaround is to remove the slave with "going back" status from
the bond and re-enslave it. This issue was encountered when DPDK PMD
interfaces were being enslaved to an active-backup bond.

I would be possible to fix the issue in bond_enslave() or
bond_change_active_slave() but the ARP monitor was fixed instead to
keep most of the actions changing the current ARP slave in the ARP
monitor code. The current ARP slave is set as inactive and backup
during the commit phase. A new state, BOND_LINK_FAIL, has been
introduced for slaves in the context of the ARP monitor. This allows
administrators to see how slaves are rotated for sending ARP requests
and attempts are made to find a new active slave.

Fixes: b2220cad583c9 ("bonding: refactor ARP active-backup monitor")
Signed-off-by: Jiri Wiesner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: handle the return value of pskb_carve_frag_list() correctly

pskb_carve_frag_list() may return -ENOMEM in pskb_carve_inside_nonlinear().
we should handle this correctly or we would get wrong sk_buff.

Fixes: 6fa01ccd8830 ("skbuff: Add pskb_extract() helper function")
Signed-off-by: Miaohe Lin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm/amd/display: fix pow() crashing when given base 0

[Why&How]
pow(a,x) is implemented as exp(x*log(a)). log(0) will crash.
So return 0^x = 0, unless x=0, convention seems to be 0^0 = 1.

Cc: [email protected]
Signed-off-by: Krunoslav Kovac <[email protected]>
Reviewed-by: Anthony Koo <[email protected]>
Acked-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Reset scrambling on Test Pattern

[Why]
Programming is missing the sequence where for eDP the scrambling is
reset when testing for eye diagram test pattern.

[How]
Include the required register in the definition

Signed-off-by: Chris Park <[email protected]>
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: fix dcn3 wide timing dsc validation

Wide timing DSC requires odm. Since spreadsheet is missing this dsc
validation we have to modify DML vba code ourselves.

Signed-off-by: Dmytro Laktyushkin <[email protected]>
Reviewed-by: Wesley Chalmers <[email protected]>
Acked-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix DFPstate hang due to view port changed

[Why]
Place the cursor in the center of screen between two pipes then
adjusting the viewport but cursour doesn't update cause DFPstate hang.

[How]
If viewport changed, update cursor as well.

Cc: [email protected]
Signed-off-by: Paul Hsieh <[email protected]>
Reviewed-by: Aric Cyr <[email protected]>
Acked-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Assign correct left shift

[Why]
Reading for DP alt registers return incorrect values due to LE_SF
definition missing.

[How]
Define correct LE_SF or DP alt registers.

Signed-off-by: Chris Park <[email protected]>
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Call DMUB for eDP power control

[Why]
If DMUB is used, LVTMA VBIOS call can be used to control eDP instead of
tranditional transmitter control. Interface is agreed with VBIOS for
eDP to use this new path to program LVTMA registers.

[How]
Create DAL interface to send DMUB command for LVTMA as currently
implemented in VBIOS.

Signed-off-by: Chris Park <[email protected]>
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: fix the wrong sdma instance query for renoir

Renoir only has one sdma instance, it will get failed once query the
sdma1 registers. So use switch-case instead of static register array.

Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: parse ta firmware for navy_flounder

Use the same case as sienna_cichlid

Signed-off-by: Bhawanpreet Lakha <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Merge tag 'spi-fix-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A bunch of fixes that came in for SPI during the merge window.

  Some from ST and others for their controller, one from Lukas for a
  race between device addition and controller unregistration and one
  from fix from Geert for the DT bindings which unbreaks validation"

* tag 'spi-fix-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  dt-bindings: lpspi: Add missing boolean type for fsl,spi-only-use-cs1-sel
  spi: stm32: always perform registers configuration prior to transfer
  spi: stm32: fixes suspend/resume management
  spi: stm32: fix stm32_spi_prepare_mbr in case of odd clk_rate
  spi: stm32: fix fifo threshold level in case of short transfer
  spi: stm32h7: fix race condition at end of transfer
  spi: stm32: clear only asserted irq flags on interrupt
  spi: Prevent adding devices below an unregistering controller

io_uring: cleanup io_import_iovec() of pre-mapped request

io_rw_prep_async() goes through a dance of clearing req->io, calling
the iovec import, then re-setting req->io. Provide an internal helper
that does the right thing without needing state tweaked to get there.

This enables further cleanups in io_read, io_write, and
io_resubmit_prep(), but that's left for another time.

Signed-off-by: Jens Axboe <[email protected]>

drm/amdgpu: fix NULL pointer access issue when unloading driver

When unloading driver by "modprobe -r amdgpu", one NULL pointer
dereference bug occurs in ras debugfs releasing. The cause is the
duplicated debugfs_remove, as drm debugfs_root dir has been cleaned
up already by drm_minor_unregister.

BUG: kernel NULL pointer dereference, address: 00000000000000a0
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 11 PID: 1526 Comm: modprobe Tainted: G           OE     5.6.0-guchchen #1
Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
RIP: 0010:down_write+0x15/0x40
Code: eb de e8 7e 17 72 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 53 48 89 fb e8 92
d8 ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 43 08 5b c3
RSP: 0018:ffffb1590386fcd0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff85b2fcc2 RDI: 00000000000000a0
RBP: ffffb1590386fd30 R08: ffffffff85b2fcc2 R09: 000000000002b3c0
R10: ffff97a330618c40 R11: 00000000000005f6 R12: ffff97a3481beb40
R13: 00000000000000a0 R14: ffff97a3481beb40 R15: 0000000000000000
FS:  00007fb11a717540(0000) GS:ffff97a376cc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 00000004066d6006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
simple_recursive_removal+0x63/0x370
? debugfs_remove+0x60/0x60
debugfs_remove+0x40/0x60
amdgpu_ras_fini+0x82/0x230 [amdgpu]
? __kernfs_remove.part.17+0x101/0x1f0
? kernfs_name_hash+0x12/0x80
amdgpu_device_fini+0x1c0/0x580 [amdgpu]
amdgpu_driver_unload_kms+0x3e/0x70 [amdgpu]
amdgpu_pci_remove+0x36/0x60 [amdgpu]
pci_device_remove+0x3b/0xb0
device_release_driver_internal+0xe5/0x1c0
driver_detach+0x46/0x90
bus_remove_driver+0x58/0xd0
pci_unregister_driver+0x29/0x90
amdgpu_exit+0x11/0x25 [amdgpu]
__x64_sys_delete_module+0x13d/0x210
do_syscall_64+0x5f/0x250
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix uninit-value in arcturus_log_thermal_throttling_event()

when function arcturus_get_smu_metrics_data() call failed,
it will cause the variable "throttler_status" isn't initialized before use.

warning:
powerplay/arcturus_ppt.c:2268:24: warning: ‘throttler_status’ may be used uninitialized in this function [-Wmaybe-uninitialized]
2268 | if (throttler_status & logging_label[throttler_idx].feature_mask) {

Signed-off-by: Kevin Wang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: disable gfxoff for navy_flounder

gfxoff is temporarily disabled for navy_flounder,
since at present the feature has broken some basic
amdgpu test.

Signed-off-by: Jiansong Chen <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

net: gianfar: Add of_node_put() before goto statement

Every iteration of for_each_available_child_of_node() decrements
reference count of the previous node, however when control
is transferred from the middle of the loop, as in the case of
a return or break or goto, there is no decrement thus ultimately
resulting in a memory leak.

Fix a potential memory leak in gianfar.c by inserting of_node_put()
before the goto statement.

Issue found with Coccinelle.

Signed-off-by: Sumera Priyadarsini <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'cxgb4-Fix-ethtool-selftest-flits-calculation'

Ganji Aravind says:

====================
cxgb4: Fix ethtool selftest flits calculation

Patch 1 will fix work request size calculation for loopback selftest.

Patch 2 will fix race between loopback selftest and normal Tx handler.
====================

Signed-off-by: David S. Miller <[email protected]>

cxgb4: Fix race between loopback and normal Tx path

Even after Tx queues are marked stopped, there exists a
small window where the current packet in the normal Tx
path is still being sent out and loopback selftest ends
up corrupting the same Tx ring. So, ensure selftest takes
the Tx lock to synchronize access the Tx ring.

Fixes: 7235ffae3d2c ("cxgb4: add loopback ethtool self-test")
Signed-off-by: Ganji Aravind <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4: Fix work request size calculation for loopback test

Work request used for sending loopback packet needs to add
the firmware work request only once. So, fix by using
correct structure size.

Fixes: 7235ffae3d2c ("cxgb4: add loopback ethtool self-test")
Signed-off-by: Ganji Aravind <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'sfc-more-EF100-fixes'

Edward Cree says:

====================
sfc: more EF100 fixes

Fix up some bugs in the initial EF100 submission, and re-fix
the hash_valid fix which was incomplete.

The reset bugs are currently hard to trigger; they were found
with an in-progress patch adding ethtool support, whereby
ethtool --reset reliably reproduces them.
====================

Signed-off-by: David S. Miller <[email protected]>

sfc: don't free_irq()s if they were never requested

If efx_nic_init_interrupt fails, or was never run (e.g. due to an earlier
failure in ef100_net_open), freeing irqs in efx_nic_fini_interrupt is not
needed and will cause error messages and stack traces.
So instead, only do this if efx_nic_init_interrupt successfully completed,
as indicated by the new efx->irqs_hooked flag.

Fixes: 965b549f3c20 ("sfc_ef100: implement ndo_open/close and EVQ probing")
Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: null out channel->rps_flow_id after freeing it

If an ef100_net_open() fails, ef100_net_stop() may be called without
channel->rps_flow_id having been written; thus it may hold the address
freed by a previous ef100_net_stop()'s call to efx_remove_filters().
This then causes a double-free when efx_remove_filters() is called
again, leading to a panic.
To prevent this, after freeing it, overwrite it with NULL.

Fixes: a9dc3d5612ce ("sfc_ef100: RX filter table management and related gubbins")
Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: take correct lock in ef100_reset()

When downing and upping the ef100 filter table, we need to take a write
lock on efx->filter_sem, not just a read lock, because we may kfree()
the table pointers.
Without this, resets cause a WARN_ON from efx_rwsem_assert_write_locked().

Fixes: a9dc3d5612ce ("sfc_ef100: RX filter table management and related gubbins")
Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: really check hash is valid before using it

Actually hook up the .rx_buf_hash_valid method in EF100's nic_type.

Fixes: 068885434ccb ("sfc: check hash is valid before using it")
Reported-by: Martin Habets <[email protected]>
Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

macvlan: validate setting of multiple remote source MAC addresses

Remote source MAC addresses can be set on a 'source mode' macvlan
interface via the IFLA_MACVLAN_MACADDR_DATA attribute. This commit
tightens the validation of these MAC addresses to match the validation
already performed when setting or adding a single MAC address via the
IFLA_MACVLAN_MACADDR attribute.

iproute2 uses IFLA_MACVLAN_MACADDR_DATA for its 'macvlan macaddr set'
command, and IFLA_MACVLAN_MACADDR for its 'macvlan macaddr add' command,
which demonstrates the inconsistent behaviour that this commit
addresses:

# ip link add link eth0 name macvlan0 type macvlan mode source
# ip link set link dev macvlan0 type macvlan macaddr add 01:00:00:00:00:00
RTNETLINK answers: Cannot assign requested address
# ip link set link dev macvlan0 type macvlan macaddr set 01:00:00:00:00:00
# ip -d link show macvlan0
5: macvlan0@eth0: <BROADCAST,MULTICAST,DYNAMIC,UP,LOWER_UP> mtu 1500 ...
link/ether 2e:ac:fd:2d:69:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvlan mode source remotes (1) 01:00:00:00:00:00 numtxqueues 1 ...

With this change, the 'set' command will (rightly) fail in the same way
as the 'add' command.

Signed-off-by: Alvin Šipraga <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'fixes-2020-08-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull ia64 page table fix from Mike Rapoport:
"Fix regression in IA-64 caused by page table allocation refactoring

  The refactoring and consolidation of <asm/pgalloc.h> caused regression
  on parisc and ia64. The fix for parisc made it into v5.9-rc1 while the
  fix ia64 got delayed a bit and here it is"

* tag 'fixes-2020-08-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  arch/ia64: Restore arch-specific pgd_offset_k implementation