Git Repo - linux.git/log

net: stmmac: Determine earlier the size of RX buffer

Split Header feature needs to know the size of RX buffer but current
code is determining it too late. Fix this by moving the RX buffer
computation to earlier stage.

Changes from v2:
- Do not try to align already aligned buffer size

Fixes: 67afd6d1cfdf ("net: stmmac: Add Split Header support and enable it in XGMAC cores")
Signed-off-by: Jose Abreu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: selftests: Needs to check the number of Multicast regs

When running the MC and UC filter tests we setup a multicast address
that its expected to be blocked. If the number of available multicast
registers is zero, driver will always pass the multicast packets which
will fail the test.

Check if available multicast addresses is enough before running the
tests.

Fixes: 091810dbded9 ("net: stmmac: Introduce selftests support")
Signed-off-by: Jose Abreu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: nfc: nci: fix a possible sleep-in-atomic-context bug in nci_uart_tty_receive()

The kernel may sleep while holding a spinlock.
The function call path (from bottom to top) in Linux 4.19 is:

net/nfc/nci/uart.c, 349:
nci_skb_alloc in nci_uart_default_recv_buf
net/nfc/nci/uart.c, 255:
(FUNC_PTR)nci_uart_default_recv_buf in nci_uart_tty_receive
net/nfc/nci/uart.c, 254:
spin_lock in nci_uart_tty_receive

nci_skb_alloc(GFP_KERNEL) can sleep at runtime.
(FUNC_PTR) means a function pointer is called.

To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC for
nci_skb_alloc().

This bug is found by a static analysis tool STCheck written by myself.

Signed-off-by: Jia-Ju Bai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

io_uring: io_wq_submit_work() should not touch req->rw

I've been chasing a weird and obscure crash that was userspace stack
corruption, and finally narrowed it down to a bit flip that made a
stack address invalid. io_wq_submit_work() unconditionally flips
the req->rw.ki_flags IOCB_NOWAIT bit, but since it's a generic work
handler, this isn't valid. Normal read/write operations own that
part of the request, on other types it could be something else.

Move the IOCB_NOWAIT clear to the read/write handlers where it belongs.

Signed-off-by: Jens Axboe <[email protected]>

usb: xhci: Fix build warning seen with CONFIG_PM=n

The following build warning is seen if CONFIG_PM is disabled.

drivers/usb/host/xhci-pci.c:498:13: warning:
unused function 'xhci_pci_shutdown'

Fixes: f2c710f7dca8 ("usb: xhci: only set D3hot for pci device")
Cc: Henry Lin <[email protected]>
Cc: [email protected] # all stable releases with f2c710f7dca8
Signed-off-by: Guenter Roeck <[email protected]>
Acked-by: Mathias Nyman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

clk: Move clk_core_reparent_orphans() under CONFIG_OF

A recent addition exposed a helper that is only used for CONFIG_OF. Move
it into the CONFIG_OF zone in this file to make the compiler stop
warning about an unused function.

Fixes: 66d9506440bb ("clk: walk orphan list on clock provider registration")
Signed-off-by: Olof Johansson <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: "Simply" move the function instead]
Signed-off-by: Stephen Boyd <[email protected]>

io_uring: don't wait when under-submitting

There is no reliable way to submit and wait in a single syscall, as
io_submit_sqes() may under-consume sqes (in case of an early error).
Then it will wait for not-yet-submitted requests, deadlocking the user
in most cases.

Don't wait/poll if can't submit all sqes

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'sound-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A slightly high amount at this time, but all good and small fixes:

   - A PCM core fix that initializes the buffer properly for avoiding
     information leaks; it is a long-standing minor problem, but good to
     fix better now

   - A few ASoC core fixes for the init / cleanup ordering issues that
     surfaced after the recent refactoring

   - Lots of SOF and topology-related fixes went in, as usual as such
     hot topics

   - Several ASoC codec and platform-specific small fixes: wm89xx,
     realtek, and max98090, AMD, Intel-SST

   - A fix for the previous incomplete regression of HD-audio, now
     hitting Nvidia HDMI

   - A few HD-audio CA0132 codec fixes"

* tag 'sound-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (27 commits)
  ALSA: hda - Downgrade error message for single-cmd fallback
  ASoC: wm8962: fix lambda value
  ALSA: hda: Fix regression by strip mask fix
  ALSA: hda/ca0132 - Fix work handling in delayed HP detection
  ALSA: hda/ca0132 - Avoid endless loop
  ALSA: hda/ca0132 - Keep power on during processing DSP response
  ALSA: pcm: Avoid possible info leaks from PCM stream buffers
  ASoC: Intel: common: work-around incorrect ACPI HID for CML boards
  ASoC: SOF: Intel: split cht and byt debug window sizes
  ASoC: SOF: loader: fix snd_sof_fw_parse_ext_data
  ASoC: SOF: loader: snd_sof_fw_parse_ext_data log warning on unknown header
  ASoC: simple-card: Don't create separate link when platform is present
  ASoC: topology: Check return value for soc_tplg_pcm_create()
  ASoC: topology: Check return value for snd_soc_add_dai_link()
  ASoC: core: only flush inited work during free
  ASoC: Intel: bytcr_rt5640: Update quirk for Teclast X89
  ASoC: core: Init pcm runtime work early to avoid warnings
  ASoC: Intel: sst: Add missing include <linux/io.h>
  ASoC: max98090: fix possible race conditions
  ASoC: max98090: exit workaround earlier if PLL is locked
  ...

Merge tag 'kvmarm-fixes-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm fixes for .5.5, take #1

- Fix uninitialised sysreg accessor
- Fix handling of demand-paged device mappings
- Stop spamming the console on IMPDEF sysregs
- Relax mappings of writable memslots
- Assorted cleanups

kvm: x86: Host feature SSBD doesn't imply guest feature AMD_SSBD

The host reports support for the synthetic feature X86_FEATURE_SSBD
when any of the three following hardware features are set:
  CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31]
  CPUID.80000008H:EBX.AMD_SSBD[bit 24]
  CPUID.80000008H:EBX.VIRT_SSBD[bit 25]

Either of the first two hardware features implies the existence of the
IA32_SPEC_CTRL MSR, but CPUID.80000008H:EBX.VIRT_SSBD[bit 25] does
not. Therefore, CPUID.80000008H:EBX.AMD_SSBD[bit 24] should only be
set in the guest if CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] or
CPUID.80000008H:EBX.AMD_SSBD[bit 24] is set on the host.

Fixes: 4c6903a0f9d76 ("KVM: x86: fix reporting of AMD speculation bug CPUID leaf")
Signed-off-by: Jim Mattson <[email protected]>
Reviewed-by: Jacob Xu <[email protected]>
Reviewed-by: Peter Shier <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: [email protected]
Reported-by: Eric Biggers <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBD

The host reports support for the synthetic feature X86_FEATURE_SSBD
when any of the three following hardware features are set:
  CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31]
  CPUID.80000008H:EBX.AMD_SSBD[bit 24]
  CPUID.80000008H:EBX.VIRT_SSBD[bit 25]

Either of the first two hardware features implies the existence of the
IA32_SPEC_CTRL MSR, but CPUID.80000008H:EBX.VIRT_SSBD[bit 25] does
not. Therefore, CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] should only be
set in the guest if CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] or
CPUID.80000008H:EBX.AMD_SSBD[bit 24] is set on the host.

Fixes: 0c54914d0c52a ("KVM: x86: use Intel speculation bugs and features as derived in generic x86 code")
Signed-off-by: Jim Mattson <[email protected]>
Reviewed-by: Jacob Xu <[email protected]>
Reviewed-by: Peter Shier <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: [email protected]
Reported-by: Eric Biggers <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

iommu/dma: Relax locking in iommu_dma_prepare_msi()

Since commit ece6e6f0218b ("iommu/dma-iommu: Split iommu_dma_map_msi_msg()
in two parts"), iommu_dma_prepare_msi() should no longer have to worry
about preempting itself, nor being called in atomic context at all. Thus
we can downgrade the IRQ-safe locking to a simple mutex to avoid angering
the new might_sleep() check in iommu_map().

Reported-by: Qian Cai <[email protected]>
Tested-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

perf/smmuv3: Remove the leftover put_cpu() in error path

In smmu_pmu_probe(), there is put_cpu() in the error path,
which is wrong because we use raw_smp_processor_id() to
get the cpu ID, not get_cpu(), remove it.

While we are at it, kill 'out_cpuhp_err' altogether and
just return err if we fail to add the hotplug instance.

Acked-by: Robin Murphy <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Hanjun Guo <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

iommu/vt-d: Remove incorrect PSI capability check

The PSI (Page Selective Invalidation) bit in the capability register
is only valid for second-level translation. Intel IOMMU supporting
scalable mode must support page/address selective IOTLB invalidation
for first-level translation. Remove the PSI capability check in SVA
cache invalidation code.

Fixes: 8744daf4b0699 ("iommu/vt-d: Remove global page flush support")
Cc: Jacob Pan <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

staging: wfx: fix wrong error message

The driver checks that the number of retries made by the device is
coherent with the rate policy. However, this check make sense only if
the device has returned RETRY_EXCEEDED.

Signed-off-by: Jérôme Pouiller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

staging: wfx: fix hif_set_mfp() with big endian hosts

struct hif_mib_protected_mgmt_policy is an array of u8. There is no
reason to swap its bytes.

Signed-off-by: Jérôme Pouiller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

staging: wfx: detect race condition in WEP authentication

Current code has a special case to handle association with WEP. Before
to rework the tx data handling, let's try to detect any possible misuse
of this code.

Signed-off-by: Jérôme Pouiller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

staging: wfx: ensure that retry policy always fallbacks to MCS0 / 1Mbps

When not using HT mode, minstrel always includes 1Mbps as fallback rate.
But, when using HT mode, this fallback is not included. Yet, it seems
that it could save some frames. So, this patch add it unconditionally.

Signed-off-by: Jérôme Pouiller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

staging: wfx: fix rate control handling

A tx_retry_policy (the equivalent of a list of ieee80211_tx_rate in
hardware API) is not able to include a rate multiple time. So currently,
the driver merges the identical rates from the policy provided by
minstrel (and it try to do the best choice it can in the associated
flags) before to sent it to firmware.

Until now, when rates are merged, field "count" is set to
max(count1, count2). But, it means that the sum of retries for all rates
could be far less than initial number of retries. So, this patch changes
the value of field "count" to count1 + count2. Thus, sum of all retries
for all rates stay the same.

Signed-off-by: Jérôme Pouiller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

staging: wfx: firmware does not support more than 32 total retries

The sum of all retries for a Tx frame cannot be superior to 32.

There are 4 rates at most. So this patch limits number of retries per
rate to 8.

Signed-off-by: Jérôme Pouiller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

staging: wfx: use boolean appropriately

The field 'uploaded' is used as a boolean, so call it a boolean.

Signed-off-by: Jérôme Pouiller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

staging: wfx: fix counter overflow

Some weird behaviors were observed when connection is really good and
packets are small. It appears that sometime, number of packets in queues
can exceed 255 and generate an overflow in field usage_count.

Signed-off-by: Jérôme Pouiller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

staging: wfx: fix case of lack of tx_retry_policies

In some rare cases, driver may not have any available tx_retry_policies.
In this case, the driver asks to mac80211 to stop sending data. However,
it seems that a race is possible and a few frames can be sent to the
driver. In this case, driver can't wait for free tx_retry_policies since
wfx_tx() must be atomic. So, this patch fix this case by sending these
frames with the special policy number 15.

The firmware normally use policy 15 to send internal frames (PS-poll,
beacons, etc...). So, it is not a so bad fallback.

Signed-off-by: Jérôme Pouiller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

staging: wfx: fix the cache of rate policies on interface reset

Device and driver maintain a cache of rate policies (aka.
tx_retry_policy in hardware API).

When hif_reset() is sent to hardware, device resets its cache of rate
policies. In order to keep driver in sync, it is necessary to do the
same on driver.

Note, when driver tries to use a rate policy that has not been defined
on device, data is sent at 1Mbps. So, this patch should fix abnormal
throughput observed sometime after a reset of the interface.

Signed-off-by: Jérôme Pouiller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

kconfig: remove ---help--- from documentation

Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), scripts/checkpatch.pl warns the use of ---help---.

Kconfig still supports ---help---, but new code should avoid using it.
Let's stop advertising it in documentation.

Signed-off-by: Masahiro Yamada <[email protected]>

mmc: sdhci: Add a quirk for broken command queuing

Command queuing has been reported broken on some systems based on Intel
GLK. A separate patch disables command queuing in some cases.

This patch adds a quirk for broken command queuing, which enables users
with problems to disable command queuing using sdhci module parameters for
quirks.

Fixes: 8ee82bda230f ("mmc: sdhci-pci: Add CQHCI support for Intel GLK")
Signed-off-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

mmc: sdhci: Workaround broken command queuing on Intel GLK

Command queuing has been reported broken on some Lenovo systems based on
Intel GLK. This is likely a BIOS issue, so disable command queuing for
Intel GLK if the BIOS vendor string is "LENOVO".

Fixes: 8ee82bda230f ("mmc: sdhci-pci: Add CQHCI support for Intel GLK")
Signed-off-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

mmc: sdhci-of-esdhc: fix P2020 errata handling

Two previous patches introduced below quirks for P2020 platforms.
- SDHCI_QUIRK_RESET_AFTER_REQUEST
- SDHCI_QUIRK_BROKEN_TIMEOUT_VAL

The patches made a mistake to add them in quirks2 of sdhci_host
structure, while they were defined for quirks.
host->quirks2 |= SDHCI_QUIRK_RESET_AFTER_REQUEST;
host->quirks2 |= SDHCI_QUIRK_BROKEN_TIMEOUT_VAL;

This patch is to fix them.
host->quirks |= SDHCI_QUIRK_RESET_AFTER_REQUEST;
host->quirks |= SDHCI_QUIRK_BROKEN_TIMEOUT_VAL;

Fixes: 05cb6b2a66fa ("mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support")
Fixes: a46e42712596 ("mmc: sdhci-of-esdhc: add erratum eSDHC5 support")
Signed-off-by: Yangbo Lu <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

Merge tag 'gvt-fixes-2019-12-18' of https://github.com/intel/gvt-linux into drm-intel-fixes

gvt-fixes-2019-12-18

- vGPU state setting locking fix (Zhenyu)
- Fix vGPU display dmabuf as read-only (Zhenyu)
- Properly handle vGPU display dmabuf page pin when rendering (Tina)
- Fix one guest boot warning to handle guc reset state (Fred)

Signed-off-by: Joonas Lahtinen <[email protected]>
From: Zhenyu Wang <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/i915: Fix pid leak with banned clients

Get_pid_task() needs to be paired with a put_pid or we leak a pid
reference every time a banned client tries to create a context.

v2:
* task_pid_nr helper exists! (Chris)

Signed-off-by: Tvrtko Ursulin <[email protected]>
Fixes: b083a0870c79 ("drm/i915: Add per client max context ban limit")
Cc: Chris Wilson <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit ba16a48af797db124ac100417f9229b1650ce1fb)
Signed-off-by: Joonas Lahtinen <[email protected]>

drm/i915/gem: Keep request alive while attaching fences

Since commit e5dadff4b093 ("drm/i915: Protect request retirement with
timeline->mutex"), the request retirement can happen outside of the
struct_mutex serialised only by the timeline->mutex. We drop the
timeline->mutex on submitting the request (i915_request_add) so after
that point, it is liable to be freed. Make sure our local reference is
kept alive until we have finished attaching it to the signalers. (Note
that this erodes the argument that i915_request_add should consume the
reference, but that is a slightly larger patch!)

Fixes: e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit e14177f19739d74839eb496a27f5f5d958beaa5b)
Signed-off-by: Joonas Lahtinen <[email protected]>

net-sysfs: Call dev_hold always in rx_queue_add_kobject

Dev_hold has to be called always in rx_queue_add_kobject.
Otherwise usage count drops below 0 in case of failure in
kobject_init_and_add.

Fixes: b8eb718348b8 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
Reported-by: syzbot <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: David Miller <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Signed-off-by: Jouni Hogander <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nfp: flower: fix stats id allocation

As flower rules are added, they are given a stats ID based on the number
of rules that can be supported in firmware. Only after the initial
allocation of all available IDs does the driver begin to reuse those that
have been released.

The initial allocation of IDs was modified to account for multiple memory
units on the offloaded device. However, this introduced a bug whereby the
counter that controls the IDs could be decremented before the ID was
assigned (where it is further decremented). This means that the stats ID
could be assigned as -1/0xfffffff which is out of range.

Fix this by only decrementing the main counter after the current ID has
been assigned.

Fixes: 467322e2627f ("nfp: flower: support multiple memory units for filter offloads")
Signed-off-by: John Hurley <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: make unexported dsa_link_touch() static

dsa_link_touch() is not exported, or defined outside of the
file it is in so make it static to avoid the following warning:

net/dsa/dsa2.c:127:17: warning: symbol 'dsa_link_touch' was not declared. Should it be static?

Signed-off-by: Ben Dooks (Codethink) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ag71xx: fix compile warnings

drivers/net/ethernet/atheros/ag71xx.c: In function 'ag71xx_probe':
drivers/net/ethernet/atheros/ag71xx.c:1776:30: warning: passing argument 2 of
'of_get_phy_mode' makes pointer from integer without a cast [-Wint-conversion]
In file included from drivers/net/ethernet/atheros/ag71xx.c:33:
./include/linux/of_net.h:15:69: note: expected 'phy_interface_t *'
{aka 'enum <anonymous> *'} but argument is of type 'int'

Fixes: 0c65b2b90d13c1 ("net: of_get_phy_mode: Change API to solve int/unit warnings")
Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fix kernel-doc warning in <linux/netdevice.h>

Fix missing '*' kernel-doc notation that causes this warning:

../include/linux/netdevice.h:1779: warning: bad line: spinlock

Fixes: ab92d68fc22f ("net: core: add generic lockdep keys")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: annotate lockless accesses to sk->sk_pacing_shift

sk->sk_pacing_shift can be read and written without lock
synchronization. This patch adds annotations to
document this fact and avoid future syzbot complains.

This might also avoid unexpected false sharing
in sk_pacing_shift_update(), as the compiler
could remove the conditional check and always
write over sk->sk_pacing_shift :

if (sk->sk_pacing_shift != val)
sk->sk_pacing_shift = val;

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: qlogic: Fix error paths in ql_alloc_large_buffers()

ql_alloc_large_buffers() has the usual RX buffer allocation
loop where it allocates skbs and maps them for DMA.  It also
treats failure as a fatal error.

There are (at least) three bugs in the error paths:

1. ql_free_large_buffers() assumes that the lrg_buf[] entry for the
first buffer that couldn't be allocated will have .skb == NULL.
But the qla_buf[] array is not zero-initialised.

2. ql_free_large_buffers() DMA-unmaps all skbs in lrg_buf[].  This is
incorrect for the last allocated skb, if DMA mapping failed.

3. Commit 1acb8f2a7a9f ("net: qlogic: Fix memory leak in
ql_alloc_large_buffers") added a direct call to dev_kfree_skb_any()
after the skb is recorded in lrg_buf[], so ql_free_large_buffers()
will double-free it.

The bugs are somewhat inter-twined, so fix them all at once:

* Clear each entry in qla_buf[] before attempting to allocate
  an skb for it.  This goes half-way to fixing bug 1.
* Set the .skb field only after the skb is DMA-mapped.  This
  fixes the rest.

Fixes: 1357bfcf7106 ("qla3xxx: Dynamically size the rx buffer queue ...")
Fixes: 0f8ab89e825f ("qla3xxx: Check return code from pci_map_single() ...")
Fixes: 1acb8f2a7a9f ("net: qlogic: Fix memory leak in ql_alloc_large_buffers")
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: fix memleak on err handling of stream initialization

syzbot reported a memory leak when an allocation fails within
genradix_prealloc() for output streams. That's because
genradix_prealloc() leaves initialized members initialized when the
issue happens and SCTP stack will abort the current initialization but
without cleaning up such members.

The fix here is to always call genradix_free() when genradix_prealloc()
fails, for output and also input streams, as it suffers from the same
issue.

Reported-by: [email protected]
Fixes: 2075e50caf5e ("sctp: convert to genradix")
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Tested-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fs: call fsnotify_sb_delete after evict_inodes

When a filesystem is unmounted, we currently call fsnotify_sb_delete()
before evict_inodes(), which means that fsnotify_unmount_inodes()
must iterate over all inodes on the superblock looking for any inodes
with watches. This is inefficient and can lead to livelocks as it
iterates over many unwatched inodes.

At this point, SB_ACTIVE is gone and dropping refcount to zero kicks
the inode out out immediately, so anything processed by
fsnotify_sb_delete / fsnotify_unmount_inodes gets evicted in that loop.

After that, the call to evict_inodes will evict everything else with a
zero refcount.

This should speed things up overall, and avoid livelocks in
fsnotify_unmount_inodes().

Signed-off-by: Eric Sandeen <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Al Viro <[email protected]>

fs: avoid softlockups in s_inodes iterators

Anything that walks all inodes on sb->s_inodes list without rescheduling
risks softlockups.

Previous efforts were made in 2 functions, see:

c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
ac05fbb inode: don't softlockup when evicting inodes

but there hasn't been an audit of all walkers, so do that now. This
also consistently moves the cond_resched() calls to the bottom of each
loop in cases where it already exists.

One loop remains: remove_dquot_ref(), because I'm not quite sure how
to deal with that one w/o taking the i_lock.

Signed-off-by: Eric Sandeen <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Al Viro <[email protected]>

lib/Kconfig.debug: fix some messed up configurations

Some configuration items are messed up during conflict resolving. For
example, STRICT_DEVMEM should not in testing menu, but kunit should.
This patch fixes all of them.

[[email protected]: coding style fixes]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Changbin Du <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: vmscan: protect shrinker idr replace with CONFIG_MEMCG

Since commit 0a432dcbeb32 ("mm: shrinker: make shrinker not depend on
memcg kmem"), shrinkers' idr is protected by CONFIG_MEMCG instead of
CONFIG_MEMCG_KMEM, so it makes no sense to protect shrinker idr replace
with CONFIG_MEMCG_KMEM.

And in the CONFIG_MEMCG && CONFIG_SLOB case, shrinker_idr contains only
shrinker, and it is deferred_split_shrinker. But it is never actually
called, since idr_replace() is never compiled due to the wrong #ifdef.
The deferred_split_shrinker all the time is staying in half-registered
state, and it's never called for subordinate mem cgroups.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 0a432dcbeb32 ("mm: shrinker: make shrinker not depend on memcg kmem")
Signed-off-by: Yang Shi <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: <[email protected]> [5.4+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kasan: don't assume percpu shadow allocations will succeed

syzkaller and the fault injector showed that I was wrong to assume that
we could ignore percpu shadow allocation failures.

Handle failures properly. Merge all the allocated areas back into the
free list and release the shadow, then clean up and return NULL. The
shadow is released unconditionally, which relies upon the fact that the
release function is able to tolerate pages not being present.

Also clean up shadows in the recovery path - currently they are not
released, which leaks a bit of memory.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory")
Signed-off-by: Daniel Axtens <[email protected]>
Reported-by: [email protected]
Reported-by: [email protected]
Reviewed-by: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kasan: use apply_to_existing_page_range() for releasing vmalloc shadow

kasan_release_vmalloc uses apply_to_page_range to release vmalloc
shadow. Unfortunately, apply_to_page_range can allocate memory to fill
in page table entries, which is not what we want.

Also, kasan_release_vmalloc is called under free_vmap_area_lock, so if
apply_to_page_range does allocate memory, we get a sleep in atomic bug:

BUG: sleeping function called from invalid context at mm/page_alloc.c:4681
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 15087, name:

Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x199/0x216 lib/dump_stack.c:118
___might_sleep.cold.97+0x1f5/0x238 kernel/sched/core.c:6800
__might_sleep+0x95/0x190 kernel/sched/core.c:6753
prepare_alloc_pages mm/page_alloc.c:4681 [inline]
__alloc_pages_nodemask+0x3cd/0x890 mm/page_alloc.c:4730
alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2211
alloc_pages include/linux/gfp.h:532 [inline]
__get_free_pages+0xc/0x40 mm/page_alloc.c:4786
__pte_alloc_one_kernel include/asm-generic/pgalloc.h:21 [inline]
pte_alloc_one_kernel include/asm-generic/pgalloc.h:33 [inline]
__pte_alloc_kernel+0x1d/0x200 mm/memory.c:459
apply_to_pte_range mm/memory.c:2031 [inline]
apply_to_pmd_range mm/memory.c:2068 [inline]
apply_to_pud_range mm/memory.c:2088 [inline]
apply_to_p4d_range mm/memory.c:2108 [inline]
apply_to_page_range+0x77d/0xa00 mm/memory.c:2133
kasan_release_vmalloc+0xa7/0xc0 mm/kasan/common.c:970
__purge_vmap_area_lazy+0xcbb/0x1f30 mm/vmalloc.c:1313
try_purge_vmap_area_lazy mm/vmalloc.c:1332 [inline]
free_vmap_area_noflush+0x2ca/0x390 mm/vmalloc.c:1368
free_unmap_vmap_area mm/vmalloc.c:1381 [inline]
remove_vm_area+0x1cc/0x230 mm/vmalloc.c:2209
vm_remove_mappings mm/vmalloc.c:2236 [inline]
__vunmap+0x223/0xa20 mm/vmalloc.c:2299
__vfree+0x3f/0xd0 mm/vmalloc.c:2356
__vmalloc_area_node mm/vmalloc.c:2507 [inline]
__vmalloc_node_range+0x5d5/0x810 mm/vmalloc.c:2547
__vmalloc_node mm/vmalloc.c:2607 [inline]
__vmalloc_node_flags mm/vmalloc.c:2621 [inline]
vzalloc+0x6f/0x80 mm/vmalloc.c:2666
alloc_one_pg_vec_page net/packet/af_packet.c:4233 [inline]
alloc_pg_vec net/packet/af_packet.c:4258 [inline]
packet_set_ring+0xbc0/0x1b50 net/packet/af_packet.c:4342
packet_setsockopt+0xed7/0x2d90 net/packet/af_packet.c:3695
__sys_setsockopt+0x29b/0x4d0 net/socket.c:2117
__do_sys_setsockopt net/socket.c:2133 [inline]
__se_sys_setsockopt net/socket.c:2130 [inline]
__x64_sys_setsockopt+0xbe/0x150 net/socket.c:2130
do_syscall_64+0xfa/0x780 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Switch to using the apply_to_existing_page_range() helper instead, which
won't allocate memory.

[[email protected]: s/apply_to_existing_pages/apply_to_existing_page_range/]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory")
Signed-off-by: Daniel Axtens <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Reviewed-by: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memory.c: add apply_to_existing_page_range() helper

apply_to_page_range() takes an address range, and if any parts of it are
not covered by the existing page table hierarchy, it allocates memory to
fill them in.

In some use cases, this is not what we want - we want to be able to
operate exclusively on PTEs that are already in the tables.

Add apply_to_existing_page_range() for this. Adjust the walker
functions for apply_to_page_range to take 'create', which switches them
between the old and new modes.

This will be used in KASAN vmalloc.

[[email protected]: reduce code duplication]
[[email protected]: s/apply_to_existing_pages/apply_to_existing_page_range/]
[[email protected]: initialize __apply_to_page_range::err]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Daniel Axtens <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kasan: fix crashes on access to memory mapped by vm_map_ram()

With CONFIG_KASAN_VMALLOC=y any use of memory obtained via vm_map_ram()
will crash because there is no shadow backing that memory.

Instead of sprinkling additional kasan_populate_vmalloc() calls all over
the vmalloc code, move it into alloc_vmap_area(). This will fix
vm_map_ram() and simplify the code a bit.

[[email protected]: v2]
Link: http://lkml.kernel.org/r/[email protected]:
Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory")
Signed-off-by: Andrey Ryabinin <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

KVM: PPC: Book3S HV: Don't do ultravisor calls on systems without ultravisor

Commit 22945688acd4 ("KVM: PPC: Book3S HV: Support reset of secure
guest") added a call to uv_svm_terminate, which is an ultravisor
call, without any check that the guest is a secure guest or even that
the system has an ultravisor. On a system without an ultravisor,
the ultracall will degenerate to a hypercall, but since we are not
in KVM guest context, the hypercall will get treated as a system
call, which could have random effects depending on what happens to
be in r0, and could also corrupt the current task's kernel stack.
Hence this adds a test for the guest being a secure guest before
doing uv_svm_terminate().

Fixes: 22945688acd4 ("KVM: PPC: Book3S HV: Support reset of secure guest")
Signed-off-by: Paul Mackerras <[email protected]>

io_uring: warn about unhandled opcode

Now that we have all the opcodes handled in terms of command prep and
SQE reuse, add a printk_once() to warn about any potentially new and
unhandled ones.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: read opcode and user_data from SQE exactly once

If we defer a request, we can't be reading the opcode again. Ensure that
the user_data and opcode fields are stable. For the user_data we already
have a place for it, for the opcode we can fill a one byte hold and store
that as well. For both of them, assign them when we originally read the
SQE in io_get_sqring(). Any code that uses sqe->opcode or sqe->user_data
is switched to req->opcode and req->user_data.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable

If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the timeout remove op into
the prep handling to make it safe for SQE reuse.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: make IORING_OP_CANCEL_ASYNC deferrable

If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the async cancel op into
the prep handling to make it safe for SQE reuse.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable

If we defer these commands as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the poll add/remove into
the prep handling to make it safe for SQE reuse.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: make HARDLINK imply LINK

The rules are as follows, if IOSQE_IO_HARDLINK is specified, then it's a
link and there is no need to set IOSQE_IO_LINK separately, though it
could be there. Add proper check and ensure that IOSQE_IO_HARDLINK
implies IOSQE_IO_LINK.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: any deferred command must have stable sqe data

We're currently not retaining sqe data for accept, fsync, and
sync_file_range. None of these commands need data outside of what
is directly provided, hence it can't go stale when the request is
deferred. However, it can get reused, if an application reuses
SQE entries.

Ensure that we retain the information we need and only read the sqe
contents once, off the submission path. Most of this is just moving
code into a prep and finish function.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: remove 'sqe' parameter to the OP helpers that take it

We pass in req->sqe for all of them, no need to pass it in as the
request is always passed in. This is a necessary prep patch to be
able to cleanup/fix the request prep path.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: fix pre-prepped issue with force_nonblock == true

Some of these code paths assume that any force_nonblock == true issue
is not prepped, but that's not true if we did prep as part of link setup
earlier. Check if we already have an async context allocate before
setting up a new one.

Cleanup the async context setup in general, we have a lot of duplicated
code there.

Fixes: 03b1230ca12a ("io_uring: ensure async punted sendmsg/recvmsg requests copy data")
Fixes: f67676d160c6 ("io_uring: ensure async punted read/write requests copy iovec")
Signed-off-by: Jens Axboe <[email protected]>

io-wq: re-add io_wq_current_is_worker()

This reverts commit 8cdda87a4414, we now have several use csaes for this
helper. Reinstate it.

Signed-off-by: Jens Axboe <[email protected]>

dt-bindings: Add missing 'properties' keyword enclosing 'snps,tso'

DT property definitions must be under a 'properties' keyword. This was
missing for 'snps,tso' in an if/then clause. A meta-schema fix will
catch future errors like this.

Fixes: 7db3545aef5f ("dt-bindings: net: stmmac: Convert the binding to a schemas")
Cc: "David S. Miller" <[email protected]>
Acked-by: Maxime Ripard <[email protected]>
Signed-off-by: Rob Herring <[email protected]>

Merge tag 'wireless-drivers-2019-12-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.5

First set of fixes for v5.5. Fixing security issues, some regressions
and few major bugs.

mwifiex

* security fix for handling country Information Elements (CVE-2019-14895)

* security fix for handling TDLS Information Elements

ath9k

* fix endian issue with ath9k_pci_owl_loader

mt76

* fix default mac address handling

iwlwifi

* fix merge damage which lead to firmware crashing during boot on some devices

* fix device initialisation regression on some devices
====================

Signed-off-by: David S. Miller <[email protected]>

dpaa2-ptp: fix double free of the ptp_qoriq IRQ

Upon reusing the ptp_qoriq driver, the ptp_qoriq_free() function was
used on the remove path to free any allocated resources.
The ptp_qoriq IRQ is among these resources that are freed in
ptp_qoriq_free() even though it is also a managed one (allocated using
devm_request_threaded_irq).

Drop the resource managed version of requesting the IRQ in order to not
trigger a double free of the interrupt as below:

[  226.731005] Trying to free already-free IRQ 126
[  226.735533] WARNING: CPU: 6 PID: 749 at kernel/irq/manage.c:1707
__free_irq+0x9c/0x2b8
[  226.743435] Modules linked in:
[  226.746480] CPU: 6 PID: 749 Comm: bash Tainted: G        W
5.4.0-03629-gfd7102c32b2c-dirty #912
[  226.755857] Hardware name: NXP Layerscape LX2160ARDB (DT)
[  226.761244] pstate: 40000085 (nZcv daIf -PAN -UAO)
[  226.766022] pc : __free_irq+0x9c/0x2b8
[  226.769758] lr : __free_irq+0x9c/0x2b8
[  226.773493] sp : ffff8000125039f0
(...)
[  226.856275] Call trace:
[  226.858710]  __free_irq+0x9c/0x2b8
[  226.862098]  free_irq+0x30/0x70
[  226.865229]  devm_irq_release+0x14/0x20
[  226.869054]  release_nodes+0x1b0/0x220
[  226.872790]  devres_release_all+0x34/0x50
[  226.876790]  device_release_driver_internal+0x100/0x1c0

Fixes: d346c9e86d86 ("dpaa2-ptp: reuse ptp_qoriq driver")
Cc: Yangbo Lu <[email protected]>
Signed-off-by: Ioana Ciornei <[email protected]>
Reviewed-by: Yangbo Lu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for-5.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"A mix of regression fixes and regular fixes for stable trees:

   - fix swapped error messages for qgroup enable/rescan

   - fixes for NO_HOLES feature with clone range

   - fix deadlock between iget/srcu lock/synchronize srcu while freeing
     an inode

   - fix double lock on subvolume cross-rename

   - tree log fixes
      * fix missing data checksums after replaying a log tree
      * also teach tree-checker about this problem
      * skip log replay on orphaned roots

   - fix maximum devices constraints for RAID1C -3 and -4

   - send: don't print warning on read-only mount regarding orphan
     cleanup

   - error handling fixes"

* tag 'for-5.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: send: remove WARN_ON for readonly mount
  btrfs: do not leak reloc root if we fail to read the fs root
  btrfs: skip log replay on orphaned roots
  btrfs: handle ENOENT in btrfs_uuid_tree_iterate
  btrfs: abort transaction after failed inode updates in create_subvol
  Btrfs: fix hole extent items with a zero size after range cloning
  Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues
  Btrfs: make tree checker detect checksum items with overlapping ranges
  Btrfs: fix missing data checksums after replaying a log tree
  btrfs: return error pointer from alloc_test_extent_buffer
  btrfs: fix devs_max constraints for raid1c3 and raid1c4
  btrfs: tree-checker: Fix error format string for size_t
  btrfs: don't double lock the subvol_sem for rename exchange
  btrfs: handle error in btrfs_cache_block_group
  btrfs: do not call synchronize_srcu() in inode_tree_del
  Btrfs: fix cloning range with a hole when using the NO_HOLES feature
  btrfs: Fix error messages in qgroup_rescan_init

early init: fix error handling when opening /dev/console

The comment says "this should never fail", but it definitely can fail
when you have odd initial boot filesystems, or kernel configurations.

So get the error handling right: filp_open() returns an error pointer.

Reported-by: Jesse Barnes <[email protected]>
Reported-by: youling 257 <[email protected]>
Fixes: 8243186f0cc7 ("fs: remove ksys_dup()")
Cc: Dominik Brodowski <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'regulator-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A small set of fixes for mostly minor issues here, the only real code
  ones are Wen Yang's fixes for error handling in the core and Christian
  Marussi's list_voltage() change which is a fix for disruptively bad
  performance for regulators with continuous voltage control (which are
  rare)"

* tag 'regulator-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: rn5t618: fix module aliases
  regulator: max77650: add of_match table
  regulator: core: avoid unneeded .list_voltage calls
  regulator: s5m8767: Fix a warning message
  regulator: core: fix regulator_register() error paths to properly release rdev
  regulator: fix use after free issue

Merge tag 'spi-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A relatively large set of fixes here, the biggest part of it is for
  fallout from the GPIO descriptor rework that affected several of the
  devices with usable native chip select support. There's also some new
  PCI IDs for Intel Jasper Lake devices.

  The conversion to platform_get_irq() in the fsl driver is an
  incremental fix for build errors introduced on SPARC by the earlier
  fix for error handling in probe in that driver"

* tag 'spi-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: fsl: use platform_get_irq() instead of of_irq_to_resource()
  spi: nxp-fspi: Ensure width is respected in spi-mem operations
  spi: spi-ti-qspi: Fix a bug when accessing non default CS
  spi: fsl: don't map irq during probe
  spi: spi-cavium-thunderx: Add missing pci_release_regions()
  spi: sprd: Fix the incorrect SPI register
  gpiolib: of: Make of_gpio_spi_cs_get_count static
  spi: fsl: Handle the single hardwired chipselect case
  gpio: Handle counting of Freescale chipselects
  spi: fsl: Fix GPIO descriptor support
  spi: dw: Correct handling of native chipselect
  spi: cadence: Correct handling of native chipselect
  spi: pxa2xx: Add support for Intel Jasper Lake

xfs: fix log reservation overflows when allocating large rt extents

Omar Sandoval reported that a 4G fallocate on the realtime device causes
filesystem shutdowns due to a log reservation overflow that happens when
we log the rtbitmap updates.  Factor rtbitmap/rtsummary updates into the
the tr_write and tr_itruncate log reservation calculation.

"The following reproducer results in a transaction log overrun warning
for me:

    mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb
    mount -o rtdev=/dev/vdc /dev/vdb /mnt
    fallocate -l 4G /mnt/foo

Reported-by: Omar Sandoval <[email protected]>
Tested-by: Omar Sandoval <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
"Fix kexec booting with certain EFI memory map layouts"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:
"Add HPET quirks for the Intel 'Coffee Lake H' and 'Ice Lake' platforms"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel: Disable HPET on Intel Ice Lake platforms
x86/intel: Disable HPET on Intel Coffee Lake H platforms

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
"Fix the guest-nice cpustat values in /proc"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cputime, proc/stat: Fix incorrect guest nice cpustat value

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf tooling fixes from Ingo  Molnar:
"These are all perf tooling changes: most of them are fixes.

  Note that the large CPU count related fixes go beyond regression
  fixes, but the IPI-flood symptoms are severe enough that I think
  justifies their inclusion"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  perf vendor events s390: Remove name from L1D_RO_EXCL_WRITES description
  perf vendor events s390: Fix counter long description for DTLB1_GPAGE_WRITES
  libtraceevent: Allow custom libdir path
  perf header: Fix false warning when there are no duplicate cache entries
  perf metricgroup: Fix printing event names of metric group with multiple events
  perf/x86/pmu-events: Fix Kernel_Utilization metric
  perf top: Do not bail out when perf_env__read_cpuid() returns ENOSYS
  perf arch: Make the default get_cpuid() return compatible error
  tools headers kvm: Sync linux/kvm.h with the kernel sources
  tools headers UAPI: Update tools's copy of drm.h headers
  tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
  perf inject: Fix processing of ID index for injected instruction tracing
  perf report: Bail out --mem-mode if mem info is not available
  perf report: Make -F more strict like -s
  perf report/top TUI: Replace pr_err() with ui__error()
  libtraceevent: Copy pkg-config file to output folder when using O=
  libtraceevent: Fix lib installation with O=
  perf kvm: Clarify the 'perf kvm' -i and -o command line options
  tools arch x86: Sync asm/cpufeatures.h with the kernel sources
  perf beauty: Add CLEAR_SIGHAND support for clone's flags arg
  ...

Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:
"Tone down mutex debugging complaints, and annotate/fix spinlock
  debugging data accesses for KCSAN"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "locking/mutex: Complain upon mutex API misuse in IRQ contexts"
  locking/spinlock/debug: Fix various data races

Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Ingo Molnar:
"Protect presistent EFI memory reservations from kexec, fix EFIFB early
  console, EFI stub graphics output fixes and other misc fixes."

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: Don't attempt to map RCI2 config table if it doesn't exist
  efi/earlycon: Remap entire framebuffer after page initialization
  efi: Fix efi_loaded_image_t::unload type
  efi/gop: Fix memory leak in __gop_query32/64()
  efi/gop: Return EFI_SUCCESS if a usable GOP was found
  efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs
  efi/memreserve: Register reservations as 'reserved' in /proc/iomem

random: don't forget compat_ioctl on urandom

Recently, there's been some compat ioctl cleanup, in which large
hardcoded lists were replaced with compat_ptr_ioctl. One of these
changes involved removing the random.c hardcoded list entries and adding
a compat ioctl function pointer to the random.c fops. In the process,
urandom was forgotten about, so this commit fixes that oversight.

Fixes: 507e4e2b430b ("compat_ioctl: remove /dev/random commands")
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

bpf: Fix cgroup local storage prog tracking

Recently noticed that we're tracking programs related to local storage maps
through their prog pointer. This is a wrong assumption since the prog pointer
can still change throughout the verification process, for example, whenever
bpf_patch_insn_single() is called.

Therefore, the prog pointer that was assigned via bpf_cgroup_storage_assign()
is not guaranteed to be the same as we pass in bpf_cgroup_storage_release()
and the map would therefore remain in busy state forever. Fix this by using
the prog's aux pointer which is stable throughout verification and beyond.

Fixes: de9cbbaadba5 ("bpf: introduce cgroup storage maps")
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/1471c69eca3022218666f909bc927a92388fd09e.1576580332.git.daniel@iogearbox.net

block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT

Non-mq devs do not honor REQ_NOWAIT so give a chance to the caller to repeat
request gracefully on -EAGAIN error.

The problem is well reproduced using io_uring:

   mkfs.ext4 /dev/ram0
   mount /dev/ram0 /mnt

   # Preallocate a file
   dd if=/dev/zero of=/mnt/file bs=1M count=1

   # Start fio with io_uring and get -EIO
   fio --rw=write --ioengine=io_uring --size=1M --direct=1 --name=job --filename=/mnt/file

Signed-off-by: Roman Penyaev <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

usbip: Fix error path of vhci_recv_ret_submit()

If a transaction error happens in vhci_recv_ret_submit(), event
handler closes connection and changes port status to kick hub_event.
Then hub tries to flush the endpoint URBs, but that causes infinite
loop between usb_hub_flush_endpoint() and vhci_urb_dequeue() because
"vhci_priv" in vhci_urb_dequeue() was already released by
vhci_recv_ret_submit() before a transmission error occurred. Thus,
vhci_urb_dequeue() terminates early and usb_hub_flush_endpoint()
continuously calls vhci_urb_dequeue().

The root cause of this issue is that vhci_recv_ret_submit()
terminates early without giving back URB when transaction error
occurs in vhci_recv_ret_submit(). That causes the error URB to still
be linked at endpoint list without “vhci_priv".

So, in the case of transaction error in vhci_recv_ret_submit(),
unlink URB from the endpoint, insert proper error code in
urb->status and give back URB.

Reported-by: Marek Marczykowski-Górecki <[email protected]>
Tested-by: Marek Marczykowski-Górecki <[email protected]>
Signed-off-by: Suwan Kim <[email protected]>
Cc: stable <[email protected]>
Acked-by: Shuah Khan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usbip: Fix receive error in vhci-hcd when using scatter-gather

When vhci uses SG and receives data whose size is smaller than SG
buffer size, it tries to receive more data even if it acutally
receives all the data from the server. If then, it erroneously adds
error event and triggers connection shutdown.

vhci-hcd should check if it received all the data even if there are
more SG entries left. So, check if it receivces all the data from
the server in for_each_sg() loop.

Fixes: ea44d190764b ("usbip: Implement SG support to vhci-hcd and stub driver")
Reported-by: Marek Marczykowski-Górecki <[email protected]>
Tested-by: Marek Marczykowski-Górecki <[email protected]>
Signed-off-by: Suwan Kim <[email protected]>
Acked-by: Shuah Khan <[email protected]>
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: EHCI: Do not return -EPIPE when hub is disconnected

When disconnecting a USB hub that has some child device(s) connected to it
(such as a USB mouse), then the stack tries to clear halt and
reset device(s) which are _already_ physically disconnected.

The issue has been reproduced with:

CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE.
SW: U-Boot 2019.07 and kernel 4.19.40.

CPU: HP Proliant Microserver Gen8.
SW: Linux version 4.2.3-300.fc23.x86_64

In this situation there will be error bit for MMF active yet the
CERR equals EHCI_TUNE_CERR + halt. Existing implementation
interprets this as a stall [1] (chapter 8.4.5).

The possible conditions when the MMF will be active + halt
can be found from [2] (Table 4-13).

Fix for the issue is to check whether MMF is active and PID Code is
IN before checking for the stall. If these conditions are true then
it is not a stall.

What happens after the fix is that when disconnecting a hub with
attached device(s) the situation is not interpret as a stall.

[1] [https://www.usb.org/document-library/usb-20-specification, usb_20.pdf]
[2] [https://www.intel.com/content/dam/www/public/us/en/documents/
technical-specifications/ehci-specification-for-usb.pdf]

Signed-off-by: Erkka Talvitie <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/ef70941d5f349767f19c0ed26b0dd9eed8ad81bb.1576050523.git.erkka.talvitie@vincit.fi
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: typec: fusb302: Fix an undefined reference to 'extcon_get_state'

Fixes the following compile error:

drivers/usb/typec/tcpm/fusb302.o: In function `tcpm_get_current_limit':
fusb302.c:(.text+0x3ee): undefined reference to `extcon_get_state'
fusb302.c:(.text+0x422): undefined reference to `extcon_get_state'
fusb302.c:(.text+0x450): undefined reference to `extcon_get_state'
fusb302.c:(.text+0x48c): undefined reference to `extcon_get_state'
drivers/usb/typec/tcpm/fusb302.o: In function `fusb302_probe':
fusb302.c:(.text+0x980): undefined reference to `extcon_get_extcon_dev'
make: *** [vmlinux] Error 1

It is because EXTCON is build as a module, but FUSB302 is not.

Suggested-by: Heikki Krogerus <[email protected]>
Signed-off-by: zhong jiang <[email protected]>
Acked-by: Heikki Krogerus <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

intel_th: msu: Fix window switching without windows

Commit 6cac7866c2741 ("intel_th: msu: Add a sysfs attribute to trigger
window switch") adds a NULL pointer dereference in the case when there are
no windows allocated:

> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 1 SMP
> CPU: 5 PID: 1110 Comm: bash Not tainted 5.5.0-rc1+ #1
> RIP: 0010:msc_win_switch+0xa/0x80 [intel_th_msu]
> Call Trace:
> ? win_switch_store+0x9b/0xc0 [intel_th_msu]
> dev_attr_store+0x17/0x30
> sysfs_kf_write+0x3e/0x50
> kernfs_fop_write+0xda/0x1b0
> __vfs_write+0x1b/0x40
> vfs_write+0xb9/0x1a0
> ksys_write+0x67/0xe0
> __x64_sys_write+0x1a/0x20
> do_syscall_64+0x57/0x1d0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix that by disallowing window switching with multiwindow buffers without
windows.

Signed-off-by: Alexander Shishkin <[email protected]>
Fixes: 6cac7866c274 ("intel_th: msu: Add a sysfs attribute to trigger window switch")
Reviewed-by: Andy Shevchenko <[email protected]>
Reported-by: Ammy Yi <[email protected]>
Tested-by: Ammy Yi <[email protected]>
Cc: [email protected] # v5.2+
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

intel_th: Fix freeing IRQs

Commit aac8da65174a ("intel_th: msu: Start handling IRQs") implicitly
relies on the use of devm_request_irq() to subsequently free the irqs on
device removal, but in case of the pci_free_irq_vectors() API, the
handlers need to be freed before it is called. Therefore, at the moment
the driver's remove path trips a BUG_ON(irq_has_action()):

> kernel BUG at drivers/pci/msi.c:375!
> invalid opcode: 0000 1 SMP
> CPU: 2 PID: 818 Comm: rmmod Not tainted 5.5.0-rc1+ #1
> RIP: 0010:free_msi_irqs+0x67/0x1c0
> pci_disable_msi+0x116/0x150
> pci_free_irq_vectors+0x1b/0x20
> intel_th_pci_remove+0x22/0x30 [intel_th_pci]
> pci_device_remove+0x3e/0xb0
> device_release_driver_internal+0xf0/0x1c0
> driver_detach+0x4c/0x8f
> bus_remove_driver+0x5c/0xd0
> driver_unregister+0x31/0x50
> pci_unregister_driver+0x40/0x90
> intel_th_pci_driver_exit+0x10/0xad6 [intel_th_pci]
> __x64_sys_delete_module+0x147/0x290
> ? exit_to_usermode_loop+0xd7/0x120
> do_syscall_64+0x57/0x1b0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by explicitly freeing irqs before freeing the vectors. We keep
using the devm_* variants because they are still useful in early error
paths.

Signed-off-by: Alexander Shishkin <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Fixes: aac8da65174a ("intel_th: msu: Start handling IRQs")
Reported-by: Ammy Yi <[email protected]>
Tested-by: Ammy Yi <[email protected]>
Cc: [email protected] # v5.2+
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

intel_th: pci: Add Elkhart Lake SOC support

This adds support for Intel Trace Hub in Elkhart Lake.

Signed-off-by: Alexander Shishkin <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

intel_th: pci: Add Comet Lake PCH-V support

This adds Intel(R) Trace Hub PCI ID for Comet Lake PCH-V.

Signed-off-by: Alexander Shishkin <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge tag 'asoc-fix-v5.5-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.5

A collection of fixes since the merge window, mostly driver specific but
there's a few in the core that clean up fallout from the refactorings
done in the last cycle.

tty/serial: atmel: fix out of range clock divider handling

Use MCK_DIV8 when the clock divider is > 65535. Unfortunately the mode
register was already written thus the clock selection is ignored.

Fix by doing the baud rate calulation before setting the mode.

Fixes: 5bf5635ac170 ("tty/serial: atmel: add fractional baud rate support")
Signed-off-by: David Engraf <[email protected]>
Acked-by: Ludovic Desroches <[email protected]>
Acked-by: Richard Genoud <[email protected]>
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: link tty and port before configuring it as console

There seems to be a race condition in tty drivers and I could see on
many boot cycles a NULL pointer dereference as tty_init_dev() tries to
do 'tty->port->itty = tty' even though tty->port is NULL.
'tty->port' will be set by the driver and if the driver has not yet done
it before we open the tty device we can get to this situation. By adding
some extra debug prints, I noticed that:

6.650130: uart_add_one_port
6.663849: register_console
6.664846: tty_open
6.674391: tty_init_dev
6.675456: tty_port_link_device

uart_add_one_port() registers the console, as soon as it registers, the
userspace tries to use it and that leads to tty_open() but
uart_add_one_port() has not yet done tty_port_link_device() and so
tty->port is not yet configured when control reaches tty_init_dev().

Further look into the code and tty_port_link_device() is done by
uart_add_one_port(). After registering the console uart_add_one_port()
will call tty_port_register_device_attr_serdev() and
tty_port_link_device() is called from this.

Call add tty_port_link_device() before uart_configure_port() is done and
add a check in tty_port_link_device() so that it only links the port if
it has not been done yet.

Suggested-by: Jiri Slaby <[email protected]>
Signed-off-by: Sudip Mukherjee <[email protected]>
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

sched/cfs: fix spurious active migration

The load balance can fail to find a suitable task during the periodic check
because the imbalance is smaller than half of the load of the waiting
tasks. This results in the increase of the number of failed load balance,
which can end up to start an active migration. This active migration is
useless because the current running task is not a better choice than the
waiting ones. In fact, the current task was probably not running but
waiting for the CPU during one of the previous attempts and it had already
not been selected.

When load balance fails too many times to migrate a task, we should relax
the contraint on the maximum load of the tasks that can be migrated
similarly to what is done with cache hotness.

Before the rework, load balance used to set the imbalance to the average
load_per_task in order to mitigate such situation. This increased the
likelihood of migrating a task but also of selecting a larger task than
needed while more appropriate ones were in the list.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched/fair: Fix find_idlest_group() to handle CPU affinity

Because of CPU affinity, the local group can be skipped which breaks the
assumption that statistics are always collected for local group. With
uninitialized local_sgs, the comparison is meaningless and the behavior
unpredictable. This can even end up to use local pointer which is to
NULL in this case.

If the local group has been skipped because of CPU affinity, we return
the idlest group.

Fixes: 57abff067a08 ("sched/fair: Rework find_idlest_group()")
Reported-by: John Stultz <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Tested-by: John Stultz <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

psi: Fix a division error in psi poll()

The psi window size is a u64 an can be up to 10 seconds right now,
which exceeds the lower 32 bits of the variable. We currently use
div_u64 for it, which is meant only for 32-bit divisors. The result is
garbage pressure sampling values and even potential div0 crashes.

Use div64_u64.

Signed-off-by: Johannes Weiner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Suren Baghdasaryan <[email protected]>
Cc: Jingfeng Xie <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched/psi: Fix sampling error and rare div0 crashes with cgroups and high uptime

Jingfeng reports rare div0 crashes in psi on systems with some uptime:

[58914.066423] divide error: 0000 [#1] SMP
[58914.070416] Modules linked in: ipmi_poweroff ipmi_watchdog toa overlay fuse tcp_diag inet_diag binfmt_misc aisqos(O) aisqos_hotfixes(O)
[58914.083158] CPU: 94 PID: 140364 Comm: kworker/94:2 Tainted: G W OE K 4.9.151-015.ali3000.alios7.x86_64 #1
[58914.093722] Hardware name: Alibaba Alibaba Cloud ECS/Alibaba Cloud ECS, BIOS 3.23.34 02/14/2019
[58914.102728] Workqueue: events psi_update_work
[58914.107258] task: ffff8879da83c280 task.stack: ffffc90059dcc000
[58914.113336] RIP: 0010:[] [] psi_update_stats+0x1c1/0x330
[58914.122183] RSP: 0018:ffffc90059dcfd60 EFLAGS: 00010246
[58914.127650] RAX: 0000000000000000 RBX: ffff8858fe98be50 RCX: 000000007744d640
[58914.134947] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00003594f700648e
[58914.142243] RBP: ffffc90059dcfdf8 R08: 0000359500000000 R09: 0000000000000000
[58914.149538] R10: 0000000000000000 R11: 0000000000000000 R12: 0000359500000000
[58914.156837] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8858fe98bd78
[58914.164136] FS: 0000000000000000(0000) GS:ffff887f7f380000(0000) knlGS:0000000000000000
[58914.172529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[58914.178467] CR2: 00007f2240452090 CR3: 0000005d5d258000 CR4: 00000000007606f0
[58914.185765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[58914.193061] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[58914.200360] PKRU: 55555554
[58914.203221] Stack:
[58914.205383] ffff8858fe98bd48 00000000000002f0 0000002e81036d09 ffffc90059dcfde8
[58914.213168] ffff8858fe98bec8 0000000000000000 0000000000000000 0000000000000000
[58914.220951] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[58914.228734] Call Trace:
[58914.231337] [] psi_update_work+0x22/0x60
[58914.237067] [] process_one_work+0x189/0x420
[58914.243063] [] worker_thread+0x4e/0x4b0
[58914.248701] [] ? process_one_work+0x420/0x420
[58914.254869] [] kthread+0xe6/0x100
[58914.259994] [] ? kthread_park+0x60/0x60
[58914.265640] [] ret_from_fork+0x39/0x50
[58914.271193] Code: 41 29 c3 4d 39 dc 4d 0f 42 dc <49> f7 f1 48 8b 13 48 89 c7 48 c1
[58914.279691] RIP [] psi_update_stats+0x1c1/0x330

The crashing instruction is trying to divide the observed stall time
by the sampling period. The period, stored in R8, is not 0, but we are
dividing by the lower 32 bits only, which are all 0 in this instance.

We could switch to a 64-bit division, but the period shouldn't be that
big in the first place. It's the time between the last update and the
next scheduled one, and so should always be around 2s and comfortably
fit into 32 bits.

The bug is in the initialization of new cgroups: we schedule the first
sampling event in a cgroup as an offset of sched_clock(), but fail to
initialize the last_update timestamp, and it defaults to 0. That
results in a bogusly large sampling period the first time we run the
sampling code, and consequently we underreport pressure for the first
2s of a cgroup's life. But worse, if sched_clock() is sufficiently
advanced on the system, and the user gets unlucky, the period's lower
32 bits can all be 0 and the sampling division will crash.

Fix this by initializing the last update timestamp to the creation
time of the cgroup, thus correctly marking the start of the first
pressure sampling period in a new cgroup.

Reported-by: Jingfeng Xie <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

perf/core: Add SRCU annotation for pmus list walk

Since commit
28875945ba98d ("rcu: Add support for consolidated-RCU reader checking")

there is an additional check to ensure that a RCU related lock is held
while the RCU list is iterated.
This section holds the SRCU reader lock instead.

Add annotation to list_for_each_entry_rcu() that pmus_srcu must be
acquired during the list traversal.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Joel Fernandes (Google) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

perf/x86/intel: Fix PT PMI handling

Commit:

ccbebba4c6bf ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it")

skips the PT/LBR exclusivity check on CPUs where PT and LBRs coexist, but
also inadvertently skips the active_events bump for PT in that case, which
is a bug. If there aren't any hardware events at the same time as PT, the
PMI handler will ignore PT PMIs, as active_events reads zero in that case,
resulting in the "Uhhuh" spurious NMI warning and PT data loss.

Fix this by always increasing active_events for PT events.

Fixes: ccbebba4c6bf ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it")
Reported-by: Vitaly Slobodskoy <[email protected]>
Signed-off-by: Alexander Shishkin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Alexey Budankov <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

perf/x86/intel/bts: Fix the use of page_private()

Commit

  8062382c8dbe2 ("perf/x86/intel/bts: Add BTS PMU driver")

brought in a warning with the BTS buffer initialization
that is easily tripped with (assuming KPTI is disabled):

instantly throwing:

> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 326 at arch/x86/events/intel/bts.c:86 bts_buffer_setup_aux+0x117/0x3d0
> Modules linked in:
> CPU: 2 PID: 326 Comm: perf Not tainted 5.4.0-rc8-00291-gceb9e77324fa #904
> RIP: 0010:bts_buffer_setup_aux+0x117/0x3d0
> Call Trace:
>  rb_alloc_aux+0x339/0x550
>  perf_mmap+0x607/0xc70
>  mmap_region+0x76b/0xbd0
...

It appears to assume (for lost raisins) that PagePrivate() is set,
while later it actually tests for PagePrivate() before using
page_private().

Make it consistent and always check PagePrivate() before using
page_private().

Fixes: 8062382c8dbe2 ("perf/x86/intel/bts: Add BTS PMU driver")
Signed-off-by: Alexander Shishkin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

perf/x86: Fix potential out-of-bounds access

UBSAN reported out-of-bound accesses for x86_pmu.event_map(), it's
arguments should be < x86_pmu.max_events. Make sure all users observe
this constraint.

Reported-by: Meelis Roos <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Meelis Roos <[email protected]>

Merge tag 'perf-urgent-for-mingo-5.5-20191216' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes:

perf top:

Arnaldo Carvalho de Melo:

- Do not bail out when perf_env__read_cpuid() returns ENOSYS, which
   has been reported happening on aarch64.

perf metricgroup:

  Kajol Jain:

  - Fix printing event names of metric group with multiple events

vendor events:

x86:

  Ravi Bangoria:

  - Fix Kernel_Utilization metric.

s390:

  Ed Maste:

  - Fix counter long description for DTLB1_GPAGE_WRITES and L1D_RO_EXCL_WRITES.

perf header:

  Michael Petlan:

  - Fix false warning when there are no duplicate cache entries

libtraceevent:

  Sudip Mukherjee:

  - Allow custom libdir path

API headers:

  Arnaldo Carvalho de Melo:

  - Sync linux/kvm.h with the kernel sources.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

iommu/vt-d: Allocate reserved region for ISA with correct permission

Currently the reserved region for ISA is allocated with no
permissions. If a dma domain is being used, mapping this region will
fail. Set the permissions to DMA_PTE_READ|DMA_PTE_WRITE.

Cc: Joerg Roedel <[email protected]>
Cc: Lu Baolu <[email protected]>
Cc: [email protected]
Cc: [email protected] # v5.3+
Fixes: d850c2ee5fe2 ("iommu/vt-d: Expose ISA direct mapping region via iommu_get_resv_regions")
Signed-off-by: Jerry Snitselaar <[email protected]>
Acked-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

iommu: set group default domain before creating direct mappings

iommu_group_create_direct_mappings uses group->default_domain, but
right after it is called, request_default_domain_for_dev calls
iommu_domain_free for the default domain, and sets the group default
domain to a different domain. Move the
iommu_group_create_direct_mappings call to after the group default
domain is set, so the direct mappings get associated with that domain.

Cc: Joerg Roedel <[email protected]>
Cc: Lu Baolu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: 7423e01741dd ("iommu: Add API to request DMA domain for device")
Signed-off-by: Jerry Snitselaar <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

iommu/vt-d: Fix dmar pte read access not set error

If the default DMA domain of a group doesn't fit a device, it
will still sit in the group but use a private identity domain.
When map/unmap/iova_to_phys come through iommu API, the driver
should still serve them, otherwise, other devices in the same
group will be impacted. Since identity domain has been mapped
with the whole available memory space and RMRRs, we don't need
to worry about the impact on it.

Link: https://www.spinics.net/lists/iommu/msg40416.html
Cc: Jerry Snitselaar <[email protected]>
Reported-by: Jerry Snitselaar <[email protected]>
Fixes: 942067f1b6b97 ("iommu/vt-d: Identify default domains replaced with private")
Cc: [email protected] # v5.3+
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Tested-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

drm/i915: Fix WARN_ON condition for cursor plane ddb allocation

In some cases like latency[level]==0, wm[level].res_lines>31,
min_ddb_alloc can be U16_MAX, exclude it from the WARN_ON.

v2: Specify the cases in which we hit U16_MAX, indentation (Ville)

Fixes: 10a7e07b68b9 ("drm/i915: Make sure cursor has enough ddb for the selected wm level")
Suggested-by: Ville Syrjälä <[email protected]>
Signed-off-by: Vandita Kulkarni <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 4ba487019d1a196051feefab57f4a393815733b4)
Signed-off-by: Joonas Lahtinen <[email protected]>

scripts: package: mkdebian: add missing rsync dependency

We've missed the dependency to rsync, so build fails on
minimal containers.

Fixes: 59b2bd05f5f4 ("kbuild: add 'headers' target to build up uapi headers in usr/include")
Signed-off-by: Enrico Weigelt, metux IT consult <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>