Ben Gardon [Wed, 14 Oct 2020 18:26:41 +0000 (20:26 +0200)]
kvm: x86/mmu: Separate making SPTEs from set_spte
Separate the functions for generating leaf page table entries from the
function that inserts them into the paging structure. This refactoring
will facilitate changes to the MMU sychronization model to use atomic
compare / exchanges (which are not guaranteed to succeed) instead of a
monolithic MMU lock.
No functional change expected.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This commit introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Ben Gardon [Fri, 25 Sep 2020 21:22:48 +0000 (14:22 -0700)]
kvm: mmu: Separate making non-leaf sptes from link_shadow_page
The TDP MMU page fault handler will need to be able to create non-leaf
SPTEs to build up the paging structures. Rather than re-implementing the
function, factor the SPTE creation out of link_shadow_page.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Lai Jiangshan [Wed, 30 Sep 2020 04:16:59 +0000 (21:16 -0700)]
KVM: x86: Let the guest own CR4.FSGSBASE
Add FSGSBASE to the set of possible guest-owned CR4 bits, i.e. let the
guest own it on VMX. KVM never queries the guest's CR4.FSGSBASE value,
thus there is no reason to force VM-Exit on FSGSBASE being toggled.
Note, because FSGSBASE is conditionally available, this is dependent on
recent changes to intercept reserved CR4 bits and to update the CR4
guest/host mask in response to guest CPUID changes.
Intercept CR4 bits that are guest reserved so that KVM correctly injects
a #GP fault if the guest attempts to set a reserved bit. If a feature
is supported by the CPU but is not exposed to the guest, and its
associated CR4 bit is not intercepted by KVM by default, then KVM will
fail to inject a #GP if the guest sets the CR4 bit without triggering
an exit, e.g. by toggling only the bit in question.
Note, KVM doesn't give the guest direct access to any CR4 bits that are
also dependent on guest CPUID. Yet.
KVM: x86: Move call to update_exception_bitmap() into VMX code
Now that vcpu_after_set_cpuid() and update_exception_bitmap() are called
back-to-back, subsume the exception bitmap update into the common CPUID
update. Drop the SVM invocation entirely as SVM's exception bitmap
doesn't vary with respect to guest CPUID.
KVM: x86: Invoke vendor's vcpu_after_set_cpuid() after all common updates
Move the call to kvm_x86_ops.vcpu_after_set_cpuid() to the very end of
kvm_vcpu_after_set_cpuid() to allow the vendor implementation to react
to changes made by the common code. In the near future, this will be
used by VMX to update its CR4 guest/host masks to account for reserved
bits. In the long term, SGX support will update the allowed XCR0 mask
for enclaves based on the vCPU's allowed XCR0.
vcpu_after_set_cpuid() (nee kvm_update_cpuid()) was originally added by
commit 2acf923e38fb ("KVM: VMX: Enable XSAVE/XRSTOR for guest"), and was
called separately after kvm_x86_ops.vcpu_after_set_cpuid() (nee
kvm_x86_ops->cpuid_update()). There is no indication that the placement
of the common code updates after the vendor updates was anything more
than a "new function at the end" decision.
Inspection of the current code reveals no dependency on kvm_x86_ops'
vcpu_after_set_cpuid() in kvm_vcpu_after_set_cpuid() or any of its
helpers. The bulk of the common code depends only on the guest's CPUID
configuration, kvm_mmu_reset_context() does not consume dynamic vendor
state, and there are no collisions between kvm_pmu_refresh() and VMX's
update of PT state.
Lai Jiangshan [Wed, 30 Sep 2020 04:16:55 +0000 (21:16 -0700)]
KVM: x86: Intercept LA57 to inject #GP fault when it's reserved
Unconditionally intercept changes to CR4.LA57 so that KVM correctly
injects a #GP fault if the guest attempts to set CR4.LA57 when it's
supported in hardware but not exposed to the guest.
Long term, KVM needs to properly handle CR4 bits that can be under guest
control but also may be reserved from the guest's perspective. But, KVM
currently sets the CR4 guest/host mask only during vCPU creation, and
reworking flows to change that will take a bit of elbow grease.
Even if/when generic support for intercepting reserved bits exists, it's
probably not worth letting the guest set CR4.LA57 directly. LA57 can't
be toggled while long mode is enabled, thus it's all but guaranteed to
be set once (maybe twice, e.g. by BIOS and kernel) during boot and never
touched again. On the flip side, letting the guest own CR4.LA57 may
incur extra VMREADs. In other words, this temporary "hack" is probably
also the right long term fix.
The function amd_ir_set_vcpu_affinity makes use of the parameter struct
amd_iommu_pi_data.prev_ga_tag to determine if it should delete struct
amd_iommu_pi_data from a list when not running in AVIC mode.
However, prev_ga_tag is initialized only when AVIC is enabled. The non-zero
uninitialized value can cause unintended code path, which ends up making
use of the struct vcpu_svm.ir_list and ir_list_lock without being
initialized (since they are intended only for the AVIC case).
This triggers NULL pointer dereference bug in the function vm_ir_list_del
with the following call trace:
As vcpu->arch.cpuid_entries is now allocated dynamically, the only
remaining use for KVM_MAX_CPUID_ENTRIES is to check KVM_SET_CPUID/
KVM_SET_CPUID2 input for sanity. Since it was reported that the
current limit (80) is insufficient for some CPUs, bump
KVM_MAX_CPUID_ENTRIES and use an arbitrary value '256' as the new
limit.
The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80)
is reported to be insufficient but before we bump it let's switch to
allocating vcpu->arch.cpuid_entries[] array dynamically. Currently,
'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is
3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch'
but having it pre-allocated (for all vCPUs which we also pre-allocate)
gives us no real benefits.
Another plus of the dynamic allocation is that we now do kvm_check_cpuid()
check before we assign anything to vcpu->arch.cpuid_nent/cpuid_entries so
no changes are made in case the check fails.
Opportunistically remove unneeded 'out' labels from
kvm_vcpu_ioctl_set_cpuid()/kvm_vcpu_ioctl_set_cpuid2() and return
directly whenever possible.
KVM: x86: disconnect kvm_check_cpuid() from vcpu->arch.cpuid_entries
As a preparatory step to allocating vcpu->arch.cpuid_entries dynamically
make kvm_check_cpuid() check work with an arbitrary 'struct kvm_cpuid_entry2'
array.
Currently, when kvm_check_cpuid() fails we reset vcpu->arch.cpuid_nent to
0 and this is kind of weird, i.e. one would expect CPUIDs to remain
unchanged when KVM_SET_CPUID[2] call fails.
No functional change intended. It would've been possible to move the updated
kvm_check_cpuid() in kvm_vcpu_ioctl_set_cpuid2() and check the supplied
input before we start updating vcpu->arch.cpuid_entries/nent but we
can't do the same in kvm_vcpu_ioctl_set_cpuid() as we'll have to copy
'struct kvm_cpuid_entry' entries first. The change will be made when
vcpu->arch.cpuid_entries[] array becomes allocated dynamically.
Oliver Upton [Tue, 18 Aug 2020 15:24:28 +0000 (15:24 +0000)]
kvm: x86: only provide PV features if enabled in guest's CPUID
KVM unconditionally provides PV features to the guest, regardless of the
configured CPUID. An unwitting guest that doesn't check
KVM_CPUID_FEATURES before use could access paravirt features that
userspace did not intend to provide. Fix this by checking the guest's
CPUID before performing any paravirtual operations.
Introduce a capability, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, to gate the
aforementioned enforcement. Migrating a VM from a host w/o this patch to
a host with this patch could silently change the ABI exposed to the
guest, warranting that we default to the old behavior and opt-in for
the new one.
x86/kvm: Update the comment about asynchronous page fault in exc_page_fault()
KVM was switched to interrupt-based mechanism for 'page ready' event
delivery in Linux-5.8 (see commit 2635b5c4a0e4 ("KVM: x86: interrupt based
APF 'page ready' event delivery")) and #PF (ab)use for 'page ready' event
delivery was removed. Linux guest switched to this new mechanism
exclusively in 5.9 (see commit b1d405751cd5 ("KVM: x86: Switch KVM guest to
using interrupts for page ready APF delivery")) so it is not possible to
get #PF for a 'page ready' event even when the guest is running on top
of an older KVM (APF mechanism won't be enabled). Update the comment in
exc_page_fault() to reflect the new reality.
Paolo Bonzini [Tue, 20 Oct 2020 14:57:01 +0000 (10:57 -0400)]
KVM: VMX: Forbid userspace MSR filters for x2APIC
Allowing userspace to intercept reads to x2APIC MSRs when APICV is
fully enabled for the guest simply can't work. But more in general,
the LAPIC could be set to in-kernel after the MSR filter is setup
and allowing accesses by userspace would be very confusing.
We could in principle allow userspace to intercept reads and writes to TPR,
and writes to EOI and SELF_IPI, but while that could be made it work, it
would still be silly.
Rework the resetting of the MSR bitmap for x2APIC MSRs to ignore userspace
filtering. Allowing userspace to intercept reads to x2APIC MSRs when
APICV is fully enabled for the guest simply can't work; the LAPIC and thus
virtual APIC is in-kernel and cannot be directly accessed by userspace.
To keep things simple we will in fact forbid intercepting x2APIC MSRs
altogether, independent of the default_allow setting.
On connector destruction call drm_dp_mst_topology_mgr_destroy
to release resources allocated in drm_dp_mst_topology_mgr_init.
Do it only if MST manager was initilized before otherwsie a crash
is seen on driver unload/device unplug.
From time to time, the novice kernel contributors do not add Reviewed-by
or Tested-by tags to the next versions of the patches. Mostly because
they are unaware that responsibility of adding these tags in next
version is on submitter, not maintainer.
Kees Cook [Thu, 15 Oct 2020 22:45:59 +0000 (15:45 -0700)]
docs: lkdtm: Modernize and improve details
The details on using LKDTM were overly obscure. Modernize the details
and expand examples to better illustrate how to use the interfaces.
Additionally add missing SPDX header.
The notes on replacing the deprecated str*cpy() functions didn't call
enough attention to the change in return type. Add these details and
clean up the language a bit more.
Linus Torvalds [Wed, 21 Oct 2020 18:28:43 +0000 (11:28 -0700)]
Merge tag 'linux-watchdog-5.10-rc1' of git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- Add Toshiba Visconti watchdog driver
- it87_wdt: add IT8772 + IT8784
- several fixes and improvements
* tag 'linux-watchdog-5.10-rc1' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: Add Toshiba Visconti watchdog driver
watchdog: bindings: Add binding documentation for Toshiba Visconti watchdog device
watchdog: it87_wdt: add IT8784 ID
watchdog: sp5100_tco: Enable watchdog on Family 17h devices if disabled
watchdog: sp5100: Fix definition of EFCH_PM_DECODEEN3
watchdog: renesas_wdt: support handover from bootloader
watchdog: imx7ulp: Watchdog should continue running for wait/stop mode
watchdog: rti: Simplify with dev_err_probe()
watchdog: davinci: Simplify with dev_err_probe()
watchdog: cadence: Simplify with dev_err_probe()
watchdog: remove unneeded inclusion of <uapi/linux/sched/types.h>
watchdog: Use put_device on error
watchdog: Fix memleak in watchdog_cdev_register
watchdog: imx7ulp: Strictly follow the sequence for wdog operations
watchdog: it87_wdt: add IT8772 ID
watchdog: pcwd_usb: Avoid GFP_ATOMIC where it is not needed
drivers: watchdog: rdc321x_wdt: Fix race condition bugs
Linus Torvalds [Wed, 21 Oct 2020 18:22:08 +0000 (11:22 -0700)]
Merge tag 'rtc-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"A new driver this cycle is making the bulk of the changes and the
rx8010 driver has been rework to use the modern APIs.
Summary:
Subsystem:
- new generic DT properties: aux-voltage-chargeable,
trickle-voltage-millivolt
New driver:
- Microcrystal RV-3032
Drivers:
- ds1307: use aux-voltage-chargeable
- r9701, rx8010: modernization of the driver
- rv3028: fix clock output, trickle resistor values, RAM
configuration registers"
* tag 'rtc-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits)
rtc: r9701: set range
rtc: r9701: convert to devm_rtc_allocate_device
rtc: r9701: stop setting RWKCNT
rtc: r9701: remove useless memset
rtc: r9701: stop setting a default time
rtc: r9701: remove leftover comment
rtc: rv3032: Add a driver for Microcrystal RV-3032
dt-bindings: rtc: rv3032: add RV-3032 bindings
dt-bindings: rtc: add trickle-voltage-millivolt
rtc: rv3028: ensure ram configuration registers are saved
rtc: rv3028: factorize EERD bit handling
rtc: rv3028: fix trickle resistor values
rtc: rv3028: fix clock output support
rtc: mt6397: Remove unused member dev
rtc: rv8803: simplify the return expression of rv8803_nvram_write
rtc: meson: simplify the return expression of meson_vrtc_probe
rtc: rx8010: rename rx8010_init_client() to rx8010_init()
rtc: ds1307: enable rx8130's backup battery, make it chargeable optionally
rtc: ds1307: consider aux-voltage-chargeable
rtc: ds1307: store previous charge default per chip
...
Linus Torvalds [Wed, 21 Oct 2020 17:54:05 +0000 (10:54 -0700)]
Merge branch 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
- if a host can be a client, too, the I2C core can now use it to
emulate SMBus HostNotify support (STM32 and R-Car added this so far)
- also for client mode, a testunit has been added. It can create rare
situations on the bus, so host controllers can be tested
- a binding has been added to mark the bus as "single-master". This
allows for better timeout detections
- new driver for Mellanox Bluefield
- massive refactoring of the Tegra driver
- EEPROMs recognized by the at24 driver can now have custom names
- rest is driver updates
* 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (80 commits)
Documentation: i2c: add testunit docs to index
i2c: tegra: Improve driver module description
i2c: tegra: Clean up whitespaces, newlines and indentation
i2c: tegra: Clean up and improve comments
i2c: tegra: Clean up printk messages
i2c: tegra: Clean up variable names
i2c: tegra: Improve formatting of variables
i2c: tegra: Check errors for both positive and negative values
i2c: tegra: Factor out hardware initialization into separate function
i2c: tegra: Factor out register polling into separate function
i2c: tegra: Factor out packet header setup from tegra_i2c_xfer_msg()
i2c: tegra: Factor out error recovery from tegra_i2c_xfer_msg()
i2c: tegra: Rename wait/poll functions
i2c: tegra: Remove "dma" variable from tegra_i2c_xfer_msg()
i2c: tegra: Remove redundant check in tegra_i2c_issue_bus_clear()
i2c: tegra: Remove likely/unlikely from the code
i2c: tegra: Remove outdated barrier()
i2c: tegra: Clean up variable types
i2c: tegra: Reorder location of functions in the code
i2c: tegra: Clean up probe function
...
Linus Torvalds [Wed, 21 Oct 2020 17:34:10 +0000 (10:34 -0700)]
Merge tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
- a patch that removes crush_workspace_mutex (myself). CRUSH
computations are no longer serialized and can run in parallel.
- a couple new filesystem client metrics for "ceph fs top" command
(Xiubo Li)
- a fix for a very old messenger bug that affected the filesystem,
marked for stable (myself)
- assorted fixups and cleanups throughout the codebase from Jeff and
others.
* tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-client: (27 commits)
libceph: clear con->out_msg on Policy::stateful_server faults
libceph: format ceph_entity_addr nonces as unsigned
libceph: fix ENTITY_NAME format suggestion
libceph: move a dout in queue_con_delay()
ceph: comment cleanups and clarifications
ceph: break up send_cap_msg
ceph: drop separate mdsc argument from __send_cap
ceph: promote to unsigned long long before shifting
ceph: don't SetPageError on readpage errors
ceph: mark ceph_fmt_xattr() as printf-like for better type checking
ceph: fold ceph_update_writeable_page into ceph_write_begin
ceph: fold ceph_sync_writepages into writepage_nounlock
ceph: fold ceph_sync_readpages into ceph_readpage
ceph: don't call ceph_update_writeable_page from page_mkwrite
ceph: break out writeback of incompatible snap context to separate function
ceph: add a note explaining session reject error string
libceph: switch to the new "osd blocklist add" command
libceph, rbd, ceph: "blacklist" -> "blocklist"
ceph: have ceph_writepages_start call pagevec_lookup_range_tag
ceph: use kill_anon_super helper
...
Darrick J. Wong [Fri, 9 Oct 2020 23:42:59 +0000 (16:42 -0700)]
xfs: fix fallocate functions when rtextsize is larger than 1
In commit fe341eb151ec, I forgot that xfs_free_file_space isn't strictly
a "remove mapped blocks" function. It is actually a function to zero
file space by punching out the middle and writing zeroes to the
unaligned ends of the specified range. Therefore, putting a rtextsize
alignment check in that function is wrong because that breaks unaligned
ZERO_RANGE on the realtime volume.
Furthermore, xfs_file_fallocate already has alignment checks for the
functions require the file range to be aligned to the size of a
fundamental allocation unit (which is 1 FSB on the data volume and 1 rt
extent on the realtime volume). Create a new helper to check fallocate
arguments against the realtiem allocation unit size, fix the fallocate
frontend to use it, fix free_file_space to delete the correct range, and
remove a now redundant check from insert_file_space.
NOTE: The realtime extent size is not required to be a power of two!
Fixes: fe341eb151ec ("xfs: ensure that fpunch, fcollapse, and finsert operations are aligned to rt extent size") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Chandan Babu R <[email protected]>
Matthieu Baerts [Wed, 21 Oct 2020 10:51:53 +0000 (12:51 +0200)]
mptcp: depends on IPV6 but not as a module
Like TCP, MPTCP cannot be compiled as a module. Obviously, MPTCP IPv6'
support also depends on CONFIG_IPV6. But not all functions from IPv6
code are exported.
To simplify the code and reduce modifications outside MPTCP, it was
decided from the beginning to support MPTCP with IPv6 only if
CONFIG_IPV6 was built inlined. That's also why CONFIG_MPTCP_IPV6 was
created. More modifications are needed to support CONFIG_IPV6=m.
Even if it was not explicit, until recently, we were forcing CONFIG_IPV6
to be built-in because we had "select IPV6" in Kconfig. Now that we have
"depends on IPV6", we have to explicitly set "IPV6=y" to force
CONFIG_IPV6 not to be built as a module.
In other words, we can now only have CONFIG_MPTCP_IPV6=y if
CONFIG_IPV6=y.
Note that the new dependency might hide the fact IPv6 is not supported
in MPTCP even if we have CONFIG_IPV6=m. But selecting IPV6 like we did
before was forcing it to be built-in while it was maybe not what the
user wants.
Bjorn Helgaas [Wed, 21 Oct 2020 14:58:43 +0000 (09:58 -0500)]
Merge branch 'remotes/lorenzo/pci/tegra'
- Drop return value checking for debugfs_create() calls (Greg
Kroah-Hartman)
- Convert debugfs "ports" file to use DEFINE_SEQ_ATTRIBUTE() (Liu Shixin)
* remotes/lorenzo/pci/tegra:
PCI: tegra: Convert to use DEFINE_SEQ_ATTRIBUTE macro
PCI: tegra: No need to check return value of debugfs_create() functions
Bjorn Helgaas [Wed, 21 Oct 2020 14:58:43 +0000 (09:58 -0500)]
Merge branch 'remotes/lorenzo/pci/rcar'
- Document R8A774A1, R8A774B1, R8A774E1 endpoint support in DT (Lad
Prabhakar)
- Add R8A774A1, R8A774B1, R8A774E1 (RZ/G2M, RZ/G2N, RZ/G2H) IDs to endpoint
test (Lad Prabhakar)
- Add device tree support for R8A7742 (Lad Prabhakar)
- Use "fallthrough" pseudo-keyword (Gustavo A. R. Silva)
* remotes/lorenzo/pci/rcar:
dt-bindings: PCI: rcar: Add device tree support for r8a7742
PCI: rcar-gen2: Use fallthrough pseudo-keyword
misc: pci_endpoint_test: Add Device ID for RZ/G2H PCIe controller
dt-bindings: pci: rcar-pci-ep: Document r8a774e1
misc: pci_endpoint_test: Add Device ID for RZ/G2M and RZ/G2N PCIe controllers
dt-bindings: pci: rcar-pci-ep: Document r8a774a1 and r8a774b1
Bjorn Helgaas [Wed, 21 Oct 2020 14:58:40 +0000 (09:58 -0500)]
Merge branch 'remotes/lorenzo/pci/iproc'
- Set affinity mask on MSI interrupts (Mark Tomlinson)
- Simplify by using module_bcma_driver (Liu Shixin)
- Fix 'using integer as NULL pointer' warning (Krzysztof Wilczyński)
* remotes/lorenzo/pci/iproc:
PCI: iproc: Fix using plain integer as NULL pointer in iproc_pcie_pltfm_probe
PCI: iproc: Use module_bcma_driver to simplify the code
PCI: iproc: Set affinity mask on MSI interrupts
Bjorn Helgaas [Wed, 21 Oct 2020 14:58:40 +0000 (09:58 -0500)]
Merge branch 'remotes/lorenzo/pci/imx6'
- Use "fallthrough" pseudo-keyword (Gustavo A. R. Silva)
- Drop redundant error messages after devm_clk_get() (Anson Huang)
* remotes/lorenzo/pci/imx6:
PCI: imx6: Do not output error message when devm_clk_get() failed with -EPROBE_DEFER
PCI: imx6: Use fallthrough pseudo-keyword
- Add common iATU register support (Kunihiko Hayashi)
- Remove keystone iATU register mapping in favor of generic dwc support
(Kunihiko Hayashi)
- Skip PCIE_MSI_INTR0* programming if MSI is disabled (Jisheng Zhang)
- Fix MSI page leakage in suspend/resume (Jisheng Zhang)
- Check whether link is up before attempting config access (best-effort fix
even though it's racy) (Hou Zhiqiang)
* remotes/lorenzo/pci/dwc:
PCI: dwc: Add link up check in dw_child_pcie_ops.map_bus()
PCI: dwc: Fix MSI page leakage in suspend/resume
PCI: dwc: Skip PCIE_MSI_INTR0* programming if MSI is disabled
PCI: keystone: Remove iATU register mapping
PCI: dwc: Add common iATU register support
dt-bindings: PCI: uniphier-ep: Add iATU register description
dt-bindings: PCI: uniphier: Add iATU register description
PCI: dwc: Fix 'cast truncates bits from constant value'
misc: pci_endpoint_test: Add driver data for Layerscape PCIe controllers
misc: pci_endpoint_test: Add LS1088a in pci_device_id table
PCI: layerscape: Add EP mode support for ls1088a and ls2088a
PCI: layerscape: Modify the MSIX to the doorbell mode
PCI: layerscape: Modify the way of getting capability with different PEX
PCI: layerscape: Fix some format issue of the code
dt-bindings: pci: layerscape-pci: Add compatible strings for ls1088a and ls2088a
PCI: designware-ep: Modify MSI and MSIX CAP way of finding
PCI: designware-ep: Move the function of getting MSI capability forward
PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode
PCI: designware-ep: Add multiple PFs support for DWC
PCI: dwc: Use DBI accessors
PCI: dwc: Move N_FTS setup to common setup
PCI: dwc/intel-gw: Drop unused max_width
PCI: dwc/intel-gw: Move getting PCI_CAP_ID_EXP offset to intel_pcie_link_setup()
PCI: dwc/intel-gw: Drop unnecessary checking of DT 'device_type' property
PCI: dwc: Set PORT_LINK_DLL_LINK_EN in common setup code
PCI: dwc: Centralize link gen setting
PCI: dwc: Make ATU accessors private
PCI: dwc: Remove read_dbi2 code
PCI: dwc/tegra: Use common Designware port logic register definitions
PCI: dwc: Remove hardcoded PCI_CAP_ID_EXP offset
PCI: dwc/qcom: Use common PCI register definitions
PCI: dwc/imx6: Use common PCI register definitions
PCI: dwc/meson: Rework PCI config and DW port logic register accesses
PCI: dwc/meson: Drop unnecessary RC config space initialization
PCI: dwc/meson: Drop the duplicate number of lanes setup
PCI: dwc: Ensure FAST_LINK_MODE is cleared
PCI: dwc: Add a 'num_lanes' field to struct dw_pcie
PCI: dwc/imx6: Remove duplicate define PCIE_LINK_WIDTH_SPEED_CONTROL
PCI: dwc: Check CONFIG_PCI_MSI inside dw_pcie_msi_init()
PCI: dwc/keystone: Drop duplicated 'num-viewport'
PCI: dwc: Simplify config space handling
PCI: dwc: Remove storing of PCI resources
PCI: dwc: Remove root_bus pointer
PCI: dwc: Convert to use pci_host_probe()
PCI: dwc: keystone: Convert .scan_bus() callback to use add_bus
PCI: Also call .add_bus() callback for root bus
PCI: dwc: Use generic config accessors
PCI: dwc: Remove dwc specific config accessor ops
PCI: dwc: histb: Use pci_ops for root config space accessors
PCI: dwc: exynos: Use pci_ops for root config space accessors
PCI: dwc: kirin: Use pci_ops for root config space accessors
PCI: dwc: meson: Use pci_ops for root config space accessors
PCI: dwc: tegra: Use pci_ops for root config space accessors
PCI: dwc: keystone: Use pci_ops for config space accessors
PCI: dwc: al: Use pci_ops for child config space accessors
PCI: dwc: Add a default pci_ops.map_bus for root port
PCI: dwc: Allow overriding bridge pci_ops
PCI: dwc: Use DBI accessors instead of own config accessors
PCI: Allow root and child buses to have different pci_ops
PCI: designware-ep: Fix the Header Type check
Bjorn Helgaas [Wed, 21 Oct 2020 14:58:38 +0000 (09:58 -0500)]
Merge branch 'remotes/lorenzo/pci/brcmstb'
- Make PCIE_BRCMSTB depend on and default to ARCH_BRCMSTB (Jim Quinlan)
- Add DT bindings for 7278, 7216, 7211, and new properties (Jim Quinlan)
- Add bcm7278 register info (Jim Quinlan)
- Add suspend and resume pm_ops (Jim Quinlan)
- Add bcm7278 PERST# support (Jim Quinlan)
- Add control of RESCAL reset (Jim Quinlan)
- Set additional internal memory DMA viewport sizes (Jim Quinlan)
- Accommodate MSI for older chips (Jim Quinlan)
- Set bus max burst size by chip type (Jim Quinlan)
- Add bcm7211, bcm7216, bcm7445, bcm7278 to match list (Jim Quinlan)
* remotes/lorenzo/pci/brcmstb:
PCI: brcmstb: Add bcm7211, bcm7216, bcm7445, bcm7278 to match list
PCI: brcmstb: Set bus max burst size by chip type
PCI: brcmstb: Accommodate MSI for older chips
PCI: brcmstb: Set additional internal memory DMA viewport sizes
PCI: brcmstb: Add control of rescal reset
PCI: brcmstb: Add bcm7278 PERST# support
PCI: brcmstb: Add suspend and resume pm_ops
PCI: brcmstb: Add bcm7278 register info
dt-bindings: PCI: Add bindings for more Brcmstb chips
PCI: brcmstb: PCIE_BRCMSTB depends on ARCH_BRCMSTB
- Fix initialization with old Marvell's Arm Trusted Firmware (Pali Rohár)
* remotes/lorenzo/pci/aardvark:
PCI: aardvark: Fix initialization with old Marvell's Arm Trusted Firmware
phy: marvell: comphy: Convert internal SMCC firmware return codes to errno
PCI: aardvark: Move PCIe reset card code to advk_pcie_train_link()
PCI: aardvark: Implement driver 'remove' function and allow to build it as module
PCI: pci-bridge-emul: Export API functions
PCI: aardvark: Check for errors from pci_bridge_emul_init() call
PCI: aardvark: Fix compilation on s390
Bjorn Helgaas [Wed, 21 Oct 2020 14:58:36 +0000 (09:58 -0500)]
Merge branch 'remotes/lorenzo/pci/apei'
- Add ACPI APEI notifier chain for unknown (vendor) CPER records (Shiju
Jose)
- Add handling of HiSilicon HIP PCIe controller errors (Yicong Yang)
* remotes/lorenzo/pci/apei:
PCI: hip: Add handling of HiSilicon HIP PCIe controller errors
ACPI / APEI: Add a notifier chain for unknown (vendor) CPER records
- Drop double zeroing for P2PDMA sg_init_table() (Julia Lawall)
* pci/misc:
PCI: v3-semi: Remove unneeded break
PCI/P2PDMA: Drop double zeroing for sg_init_table()
PCI: Simplify bool comparisons
PCI: endpoint: Use "NULL" instead of "0" as a NULL pointer
PCI: Simplify pci_dev_reset_slot_function()
PCI: Update mmap-related #ifdef comments
PCI/LINK: Print IRQ number used by port
PCI/IOV: Simplify pci-pf-stub with module_pci_driver()
PCI: Use scnprintf(), not snprintf(), in sysfs "show" functions
x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled
PCI: Remove unnecessary header includes
Bjorn Helgaas [Wed, 21 Oct 2020 14:58:34 +0000 (09:58 -0500)]
Merge branch 'pci/enumeration'
- Tone down message about missing optional MCFG (Jeremy Linton)
- Add schedule point in pci_read_config() (Jiang Biao)
- Add Ampere Altra SOC MCFG quirk (Tuan Phan)
- Add Kconfig options for MPS/MRRS strategy (Jim Quinlan)
* pci/enumeration:
PCI: Add Kconfig options for MPS/MRRS strategy
PCI/ACPI: Add Ampere Altra SOC MCFG quirk
PCI: Add schedule point in pci_read_config()
PCI/ACPI: Tone down missing MCFG message
Jon Derrick [Thu, 6 Aug 2020 21:00:17 +0000 (17:00 -0400)]
PCI: vmd: Update VMD PM to correctly use generic PCI PM
The pci_save_state() call in vmd_suspend() can be performed by
pci_pm_suspend_irq(). This also allows VMD to benefit from the call into
pci_prepare_to_sleep().
The pci_restore_state() call in vmd_resume() was restoring state after
pci_pm_resume()::pci_restore_standard_config() had already restored state.
It's also been suspected that the config state should have been restored
before re-requesting IRQs instead of afterwards.
Remove the pci_save_state()/pci_restore_state() calls in
vmd_suspend()/vmd_resume() to allow proper flow through generic PCI core
Power Management code.
vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call
This commit removed unnecessary spin_locks in vhost_vring_call
and related operations. Because we manipulate irq offloading
contents in vhost_vdpa ioctl code path which is already
protected by dev mutex and vq mutex.
vringh: fix __vringh_iov() when riov and wiov are different
If riov and wiov are both defined and they point to different
objects, only riov is initialized. If the wiov is not initialized
by the caller, the function fails returning -EINVAL and printing
"Readable desc 0x... after writable" error message.
This issue happens when descriptors have both readable and writable
buffers (eg. virtio-blk devices has virtio_blk_outhdr in the readable
buffer and status as last byte of writable buffer) and we call
__vringh_iov() to get both type of buffers in two different iovecs.
Let's replace the 'else if' clause with 'if' to initialize both
riov and wiov if they are not NULL.
As checkpatch pointed out, we also avoid crashing the kernel
when riov and wiov are both NULL, replacing BUG() with WARN_ON()
and returning -EINVAL.
Eli Cohen [Tue, 8 Sep 2020 12:33:46 +0000 (15:33 +0300)]
vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
set_map() is used by mlx5 vdpa to create a memory region based on the
address map passed by the iotlb argument. If we get successive calls, we
will destroy the current memory region and build another one based on
the new address mapping. We also need to setup the hardware resources
since they depend on the memory region.
If these calls happen before DRIVER_OK, It means that driver VQs may
also not been setup and we may not create them yet. In this case we want
to avoid setting up the other resources and defer this till we get
DRIVER OK.
If protected virtualization is active on s390, VIRTIO has only retricted
access to the guest memory.
Define CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS and export
arch_has_restricted_virtio_memory_access to advertize VIRTIO if that's
the case.
Pierre Morel [Thu, 10 Sep 2020 08:53:49 +0000 (10:53 +0200)]
virtio: let arch advertise guest's memory access restrictions
An architecture may restrict host access to guest memory,
e.g. IBM s390 Secure Execution or AMD SEV.
Provide a new Kconfig entry the architecture can select,
CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, when it provides
the arch_has_restricted_virtio_memory_access callback to advertise
to VIRTIO common code when the architecture restricts memory access
from the host.
The common code can then fail the probe for any device where
VIRTIO_F_ACCESS_PLATFORM is required, but not set.
Dai Ngo [Mon, 19 Oct 2020 03:42:49 +0000 (23:42 -0400)]
NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy
NFS_FS=y as dependency of CONFIG_NFSD_V4_2_INTER_SSC still have
build errors and some configs with NFSD=m to get NFS4ERR_STALE
error when doing inter server copy.
Added ops table in nfs_common for knfsd to access NFS client modules.
Fixes: 3ac3711adb88 ("NFSD: Fix NFS server build errors") Signed-off-by: Dai Ngo <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
Chris Wilson [Tue, 11 Aug 2020 09:25:32 +0000 (10:25 +0100)]
drm/i915: Drop runtime-pm assert from vgpu io accessors
The "mmio" writes into vgpu registers are simple memory traps from the
guest into the host. We do not need to assert in the guest that the
device is awake for the io as we do not write to the device itself.
However, over time we have refactored all the mmio accessors with the
result that the vgpu reuses the gen2 accessors and so inherits the
assert for runtime-pm of the native device. The assert though has
actually been there since commit 3be0bf5acca6 ("drm/i915: Create vGPU
specific MMIO operations to reduce traps").
Chris Wilson [Mon, 19 Oct 2020 10:15:23 +0000 (11:15 +0100)]
drm/i915: Force VT'd workarounds when running as a guest OS
If i915.ko is being used as a passthrough device, it does not know if
the host is using intel_iommu. Mixing the iommu and gfx causes a few
issues (such as scanout overfetch) which we need to workaround inside
the driver, so if we detect we are running under a hypervisor, also
assume the device access is being virtualised.
Chris Wilson [Mon, 19 Oct 2020 16:50:05 +0000 (17:50 +0100)]
drm/i915: Exclude low pages (128KiB) of stolen from use
The GPU is trashing the low pages of its reserved memory upon reset. If
we are using this memory for ringbuffers, then we will dutiful resubmit
the trashed rings after the reset causing further resets, and worse. We
must exclude this range from our own use. The value of 128KiB was found
by empirical measurement (and verified now with a selftest) on gen9.
Chris Wilson [Mon, 19 Oct 2020 08:34:44 +0000 (09:34 +0100)]
drm/i915/gt: Onion unwind for scratch page allocation failure
In switching to using objects for our ppGTT scratch pages, care was not
taken to avoid trying to unref NULL objects on failure. And for gen6
ppGTT, it appears we forgot entirely to unwind after a partial allocation
failure.
SeongJae Park [Wed, 23 Sep 2020 06:18:41 +0000 (08:18 +0200)]
xen-blkfront: Apply changed parameter name to the document
Commit 14e710fe7897 ("xen-blkfront: rename indirect descriptor
parameter") changed the name of the module parameter for the maximum
amount of segments in indirect requests but missed updating the
document. This commit updates the document.
SeongJae Park [Wed, 23 Sep 2020 06:18:40 +0000 (08:18 +0200)]
xen-blkfront: add a parameter for disabling of persistent grants
Persistent grants feature provides high scalability. On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled. It can be disabled from blkback side using a
module parameter, 'feature_persistent'. But, it is impossible from
blkfront side. For the reason, this commit adds a blkfront module
parameter for disabling of the feature.
SeongJae Park [Wed, 23 Sep 2020 06:18:39 +0000 (08:18 +0200)]
xen-blkback: add a parameter for disabling of persistent grants
Persistent grants feature provides high scalability. On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled. But, there is no option to disable it. For
the reason, this commit adds a module parameter for disabling of the
feature.
Stephen Boyd [Tue, 20 Oct 2020 21:45:44 +0000 (14:45 -0700)]
arm64: proton-pack: Update comment to reflect new function name
The function detect_harden_bp_fw() is gone after commit d4647f0a2ad7
("arm64: Rewrite Spectre-v2 mitigation code"). Update this comment to
reflect the new state of affairs.