drm/vmwgfx: don't check for old_crtc_state enable status
During atomic check to prepare the new topology no need to check if
old_crtc_state was enabled or not. This will cause atomic_check to fail
because due to connector routing a crtc can be in atomic_state even if
there was no change to enable status.
Antoine Tenart [Wed, 19 Sep 2018 13:29:06 +0000 (15:29 +0200)]
net: mvneta: fix the Rx desc buffer DMA unmapping
With CONFIG_DMA_API_DEBUG enabled we now get a warning when using the
mvneta driver:
mvneta d0030000.ethernet: DMA-API: device driver frees DMA memory with
wrong function [device address=0x000000001165b000] [size=4096 bytes]
[mapped as page] [unmapped as single]
This is because when using the s/w buffer management, the Rx descriptor
buffer is mapped with dma_map_page but unmapped with dma_unmap_single.
This patch fixes this by using the right unmapping function.
Fixes: 562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Paolo Abeni [Wed, 19 Sep 2018 13:02:07 +0000 (15:02 +0200)]
ip6_tunnel: be careful when accessing the inner header
the ip6 tunnel xmit ndo assumes that the processed skb always
contains an ip[v6] header, but syzbot has found a way to send
frames that fall short of this assumption, leading to the following splat:
This change addresses the issue adding the needed check before
accessing the inner header.
The ipv4 side of the issue is apparently there since the ipv4 over ipv6
initial support, and the ipv6 side predates git history.
Fixes: c4d3efafcc93 ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: [email protected] Tested-by: Alexander Potapenko <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Robert Shearman [Wed, 19 Sep 2018 12:56:53 +0000 (13:56 +0100)]
ipv6: Allow the l3mdev to be a loopback
There is no way currently for an IPv6 client connect using a loopback
address in a VRF, whereas for IPv4 the loopback address can be added:
$ sudo ip addr add dev vrfred 127.0.0.1/8
$ sudo ip -6 addr add ::1/128 dev vrfred
RTNETLINK answers: Cannot assign requested address
So allow ::1 to be configured on an L3 master device. In order for
this to be usable ip_route_output_flags needs to not consider ::1 to
be a link scope address (since oif == l3mdev and so it would be
dropped), and ipv6_rcv needs to consider the l3mdev to be a loopback
device so that it doesn't drop the packets.
net: hns3: Fix parameter type for q_id in hclge_tm_q_to_qs_map_cfg()
So far all the places calling hclge_tm_q_to_qs_map_cfg() are assigning
an u16 type value to "q_id", and in the processing of
hclge_tm_q_to_qs_map_cfg(), it also converts the "q_id" to le16.
The max tqp number for pf can be more than 256, we should use "u16" to
store the queue id, instead of "u8", which may cause data lost.
Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Peng Li <[email protected]> Signed-off-by: Salil Mehta <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: Fix client initialize state issue when roce client initialize failed
When roce is loaded before nic, the roce client will not be initialized
until nic client is initialized, but roce init flag is set before it.
Furthermore, in this case of nic initialized success and roce failed,
the nic init flag is not set, and roce init flag is not cleared.
This patch fixes it by set init flag only after the client is initialized
successfully.
net: hns3: Clear client pointer when initialize client failed or unintialize finished
If initialize client failed or finish uninitializing client, we should
clear the client pointer. It may cause unexpected result when use
uninitialized client. Meanwhile, we also should check whether client
exist when uninitialize it.
Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Peng Li <[email protected]> Signed-off-by: Salil Mehta <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: Fix cmdq registers initialization issue for vf
According to hardware's description, the head pointer register should
be written before the tail pointer register while initializing the vf
command queue. Otherwise, it may trigger an interrupt even though there
is no command received.
Fixes: fedd0c15d288 ("net: hns3: Add HNS3 VF IMP(Integrated Management Proc) cmd interface") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Peng Li <[email protected]> Signed-off-by: Salil Mehta <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: Fix for setting speed for phy failed problem
The function of genphy_read_status is that reading phy information
from HW and using these information to update SW variable. If user
is using ethtool to setting the speed of phy and service task is calling
by hclge_get_mac_phy_link, the result of speed setting is uncertain.
Because ethtool cmd will modified phydev and hclge_get_mac_phy_link also
will modified phydev.
Because phy state machine will update phy link periodically, we can
just use phydev->link to check the link status. This patch removes
function call of genphy_read_status. To ensure accuracy, this patch
adds a phy state check. If phy state is not PHY_RUNNING, we consider
link is down. Because in some scenarios, phydev->link may be link up,
but phy state is not PHY_RUNNING. This is just an intermediate state.
In fact, the link is not ready yet.
Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <[email protected]> Signed-off-by: Peng Li <[email protected]> Signed-off-by: Salil Mehta <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Peng Li [Wed, 19 Sep 2018 17:29:53 +0000 (18:29 +0100)]
net: hns3: Check hdev state when getting link status
By default, HW link status is up. If hclge_update_link_status is called
before net up, driver will print "link up". It is not suitable. hdev
state check is needed when getting link status.
net: hns3: Set STATE_DOWN bit of hdev state when stopping net
We clear STATE_DOWN bit of hdev state when starting net, but do not set
it again when stopping net. It causes that the net is down, but hdev state
is still up. STATE_DOWN bit of hdev state should be set when stopping net.
Peng Li [Wed, 19 Sep 2018 17:29:50 +0000 (18:29 +0100)]
net: hns3: Remove packet statistics of public
All pf have permission to read packet statistics of public in hardware,
but the read operation will clear registers which cause statistical
inaccuracy.
This patch removes all packet statistics of public.
Peng Li [Wed, 19 Sep 2018 17:29:47 +0000 (18:29 +0100)]
net: hns3: Add default irq affinity
All irq will float to cpu0 if do not set irq affinity.
This patch adds default irq affinity in hns3 driver, users can
also change the irq affinity in OS.
net: sun: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.
net: amd: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
net: broadcom: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
net: xilinx: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
net: toshiba: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
net: marvell: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
====================
net: phy: phylink: ensure the carrier is off when starting phylink
Following the discussion we had regarding the phylink issue related to
the carrier link state not being off when starting phylink, I sent a fix
patch a few days ago for the PPv2 driver:
https://lkml.org/lkml/2018/9/14/633
The idea was to send a patch which could go to the stable branches, but
a better solution would be to directly call netif_carrier_off() from
within phylink_start(). This is the aim of this series.
====================
Antoine Tenart [Wed, 19 Sep 2018 09:39:31 +0000 (11:39 +0200)]
net: phy: phylink: ensure the carrier is off when starting phylink
Phylink made an assumption about the carrier state being down when
calling phylink_start(). If this assumption isn't satisfied, the
internal phylink state could misbehave and a net device could end up not
being functional.
This patch fixes this by explicitly calling netif_carrier_off() in
phylink_start().
====================
net: mvpp2: improve the interrupt usage
This series aims to improve the interrupts descriptions and usage in the
Marvell PPv2 driver.
- Before the series interrupts were named after their s/w usage,
which in fact can be configured. The series rename all those
interrupts and add a description of the ones left over.
- In PPv2 the interrupts are mapped to vectors. Those vectors were
directly mapped to a given CPU, and per-cpu accesses were done. While
this worked on our cases, the registers accesses mapped to the vectors
are not actually linked to a given CPU. They instead are linked to
what is called a "s/w thread". The series modify this so that the s/w
threads are used instead of the CPU numbers, by adding an indirection.
This means we now can have systems with more CPUs than s/w threads.
This is based on today's net-next, and was tested on various boards
using both versions of the PPv2 engine.
Two more patches will be coming, to update the device trees describing a
PPv2 engine. The patches are ready, but will go through a different
tree. I'll send them once this series will be accepted. This is not an
issue as the PPv2 driver keeps the dt bindings backward compatibility.
====================
Antoine Tenart [Wed, 19 Sep 2018 09:27:11 +0000 (11:27 +0200)]
net: mvpp2: rename mvpp2_percpu function to mvpp2_thread
As the mvpp2_percpu_read/write/... functions aren't really per-cpu but
per s/w thread, rename them to include 'thread' instead of 'percpu'.
This is a cosmetic patch.
Antoine Tenart [Wed, 19 Sep 2018 09:27:10 +0000 (11:27 +0200)]
net: mvpp2: handle cases where more CPUs are available than s/w threads
The Marvell PPv2 network controller has 9 internal threads. The driver
works fine when there are less CPUs available than threads. This isn't
true if more CPUs are available. As this is a valid use case, handle
this particular case.
Antoine Tenart [Wed, 19 Sep 2018 09:27:04 +0000 (11:27 +0200)]
net: mvpp2: fix the number of queues per cpu for PPv2.2
The Marvell PPv2.2 engine only has 8 Rx queues per CPU, while PPv2.1 has
16 of them. This patch updates the code so that the Rx queues mask width
is selected given the version of the network controller used.
Antoine Tenart [Wed, 19 Sep 2018 09:27:03 +0000 (11:27 +0200)]
net: mvpp2: do not update the queue mode while probing
This patch updates the probing function so that the queue mode isn't
updated while probing, as the driver would silently end up using a
configuration not wanted by the user. The patch adds an extra check to
validate the chosen queue mode instead, and the driver will fail to
probe if the configuration is invalid.
Antoine Tenart [Wed, 19 Sep 2018 09:27:01 +0000 (11:27 +0200)]
net: mvpp2: rename the IRQs to match the hardware
This patch renames the IRQs in the Marvell PPv2 driver as their current
names match the way they are used in software. But this will change in
the future, and those IRQs have nothing to do with Rx/Tx interrupts
(this can be configured). The new binding also describe more interrupts
as some where left out.
The old binding support is kept for backward compatibility.
Antoine Tenart [Wed, 19 Sep 2018 09:27:00 +0000 (11:27 +0200)]
net: mvpp2: increase the number of s/w threads to 9
This patch sets the number of s/w threads to 9, its maximum value,
instead of 8. This is not a fix as only 4 of the s/w threads were used
so far, but more could be used in the future.
David S. Miller [Thu, 20 Sep 2018 04:06:46 +0000 (21:06 -0700)]
Merge branch 'phy_stop-synchronous'
Heiner Kallweit says:
====================
net: phy: make phy_stop() synchronous
There have been few not that successful attempts in the past to make
phy_stop() a synchronous call instead of just changing the state.
Patch 1 of this series addresses an issue which prevented this change.
At least for me it works fine now. Would appreciate if Geert could
re-test as well that suspend doesn't throw an error.
====================
net: phy: call state machine synchronously in phy_stop
phy_stop() may be called e.g. when suspending, therefore all needed
actions should be performed synchronously. Therefore add a synchronous
call to the state machine.
net: linkwatch: add check for netdevice being present to linkwatch_do_dev
When bringing down the netdevice (incl. detaching it) and calling
netif_carrier_off directly or indirectly the latter triggers an
asynchronous linkwatch event.
This linkwatch event eventually may fail to access chip registers in
the ndo_get_stats/ndo_get_stats64 callback because the device isn't
accessible any longer, see call trace in [0].
To prevent this scenario don't check for IFF_UP only, but also make
sure that the netdevice is present.
David S. Miller [Thu, 20 Sep 2018 03:32:46 +0000 (20:32 -0700)]
Merge tag 'batadv-net-for-davem-20180919' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
pull request for net: batman-adv 2018-09-19
here are some bugfixes which we would like to see integrated into net.
We forgot to bump the version number in the last round for net-next, so
the belated patch to do that is included - we hope you can adopt it.
This will most likely create a merge conflict later when merging into
net-next with this rounds net-next patchset, but net-next should keep
the 2018.4 version[1].
Dave Airlie [Thu, 20 Sep 2018 00:00:31 +0000 (10:00 +1000)]
Merge tag 'drm-misc-fixes-2018-09-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v4.19-rc5:
- Fix crash in vgem in drm_drv_uses_atomic_modeset.
- Allow atomic drivers that don't set DRIVER_ATOMIC to create debugfs entries.
- Fix compiler warning for unused connector_funcs.
- Fix null pointer deref on UDL unplug.
- Disable DRM support for sun4i's R40 for now.
(Not all patches went in for v4.19, so it has to wait a cycle.)
- NULL-terminate the of_device_id table in pl111.
- Make sure vc4 NV12 planar format works when displaying an unscaled fb.
Drew Schmitt [Mon, 20 Aug 2018 17:32:15 +0000 (10:32 -0700)]
KVM: x86: Control guest reads of MSR_PLATFORM_INFO
Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
to reads of MSR_PLATFORM_INFO.
Disabling access to reads of this MSR gives userspace the control to "expose"
this platform-dependent information to guests in a clear way. As it exists
today, guests that read this MSR would get unpopulated information if userspace
hadn't already set it (and prior to this patch series, only the CPUID faulting
information could have been populated). This existing interface could be
confusing if guests don't handle the potential for incorrect/incomplete
information gracefully (e.g. zero reported for base frequency).
Drew Schmitt [Mon, 20 Aug 2018 17:32:14 +0000 (10:32 -0700)]
KVM: x86: Turbo bits in MSR_PLATFORM_INFO
Allow userspace to set turbo bits in MSR_PLATFORM_INFO. Previously, only
the CPUID faulting bit was settable. But now any bit in
MSR_PLATFORM_INFO would be settable. This can be used, for example, to
convey frequency information about the platform on which the guest is
running.
Krish Sadhukhan [Fri, 24 Aug 2018 00:03:03 +0000 (20:03 -0400)]
nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
According to section "Checks on VMX Controls" in Intel SDM vol 3C,
the following check needs to be enforced on vmentry of L2 guests:
- Bits 5:0 of the posted-interrupt descriptor address are all 0.
- The posted-interrupt descriptor address does not set any bits
beyond the processor's physical-address width.
KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state,
it is possible for a vCPU to be blocked while it is in guest-mode.
According to Intel SDM 26.6.5 Interrupt-Window Exiting and
Virtual-Interrupt Delivery: "These events wake the logical processor
if it just entered the HLT state because of a VM entry".
Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending
deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU
should be waken from the HLT state and injected with the interrupt.
In addition, if while the vCPU is blocked (while it is in guest-mode),
it receives a nested posted-interrupt, then the vCPU should also be
waken and injected with the posted interrupt.
To handle these cases, this patch enhances kvm_vcpu_has_events() to also
check if there is a pending interrupt in L2 virtual APICv provided by
L1. That is, it evaluates if there is a pending virtual interrupt for L2
by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1
Evaluation of Pending Interrupts.
Note that this also handles the case of nested posted-interrupt by the
fact RVI is updated in vmx_complete_nested_posted_interrupt() which is
called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() ->
kvm_vcpu_running() -> vmx_check_nested_events() ->
vmx_complete_nested_posted_interrupt().
The functions
kvm_load_guest_fpu()
kvm_put_guest_fpu()
are only used locally, make them static. This requires also that both
functions are moved because they are used before their implementation.
Those functions were exported (via EXPORT_SYMBOL) before commit e5bb40251a920 ("KVM: Drop kvm_{load,put}_guest_fpu() exports").
KVM: VMX: use preemption timer to force immediate VMExit
A VMX preemption timer value of '0' is guaranteed to cause a VMExit
prior to the CPU executing any instructions in the guest. Use the
preemption timer (if it's supported) to trigger immediate VMExit
in place of the current method of sending a self-IPI. This ensures
that pending VMExit injection to L1 occurs prior to executing any
instructions in the guest (regardless of nesting level).
When deferring VMExit injection, KVM generates an immediate VMExit
from the (possibly nested) guest by sending itself an IPI. Because
hardware interrupts are blocked prior to VMEnter and are unblocked
(in hardware) after VMEnter, this results in taking a VMExit(INTR)
before any guest instruction is executed. But, as this approach
relies on the IPI being received before VMEnter executes, it only
works as intended when KVM is running as L0. Because there are no
architectural guarantees regarding when IPIs are delivered, when
running nested the INTR may "arrive" long after L2 is running e.g.
L0 KVM doesn't force an immediate switch to L1 to deliver an INTR.
For the most part, this unintended delay is not an issue since the
events being injected to L1 also do not have architectural guarantees
regarding their timing. The notable exception is the VMX preemption
timer[1], which is architecturally guaranteed to cause a VMExit prior
to executing any instructions in the guest if the timer value is '0'
at VMEnter. Specifically, the delay in injecting the VMExit causes
the preemption timer KVM unit test to fail when run in a nested guest.
Note: this approach is viable even on CPUs with a broken preemption
timer, as broken in this context only means the timer counts at the
wrong rate. There are no known errata affecting timer value of '0'.
[1] I/O SMIs also have guarantees on when they arrive, but I have
no idea if/how those are emulated in KVM.
Signed-off-by: Sean Christopherson <[email protected]>
[Use a hook for SVM instead of leaving the default in x86.c - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
KVM: VMX: modify preemption timer bit only when arming timer
Provide a singular location where the VMX preemption timer bit is
set/cleared so that future usages of the preemption timer can ensure
the VMCS bit is up-to-date without having to modify unrelated code
paths. For example, the preemption timer can be used to force an
immediate VMExit. Cache the status of the timer to avoid redundant
VMREAD and VMWRITE, e.g. if the timer stays armed across multiple
VMEnters/VMExits.
KVM: VMX: immediately mark preemption timer expired only for zero value
A VMX preemption timer value of '0' at the time of VMEnter is
architecturally guaranteed to cause a VMExit prior to the CPU
executing any instructions in the guest. This architectural
definition is in place to ensure that a previously expired timer
is correctly recognized by the CPU as it is possible for the timer
to reach zero and not trigger a VMexit due to a higher priority
VMExit being signalled instead, e.g. a pending #DB that morphs into
a VMExit.
Whether by design or coincidence, commit f4124500c2c1 ("KVM: nVMX:
Fully emulate preemption timer") special cased timer values of '0'
and '1' to ensure prompt delivery of the VMExit. Unlike '0', a
timer value of '1' has no has no architectural guarantees regarding
when it is delivered.
Modify the timer emulation to trigger immediate VMExit if and only
if the timer value is '0', and document precisely why '0' is special.
Do this even if calibration of the virtual TSC failed, i.e. VMExit
will occur immediately regardless of the frequency of the timer.
Making only '0' a special case gives KVM leeway to be more aggressive
in ensuring the VMExit is injected prior to executing instructions in
the nested guest, and also eliminates any ambiguity as to why '1' is
a special case, e.g. why wasn't the threshold for a "short timeout"
set to 10, 100, 1000, etc...
Lei Yang [Wed, 29 Aug 2018 07:04:08 +0000 (15:04 +0800)]
kvm: selftests: use -pthread instead of -lpthread
I run into the following error
testing/selftests/kvm/dirty_log_test.c:285: undefined reference to `pthread_create'
testing/selftests/kvm/dirty_log_test.c:297: undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
my gcc version is gcc version 4.8.4
"-pthread" would work everywhere
Wei Yang [Fri, 7 Sep 2018 11:59:47 +0000 (19:59 +0800)]
KVM: x86: don't reset root in kvm_mmu_setup()
Here is the code path which shows kvm_mmu_setup() is invoked after
kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path,
this means the root_hpa and prev_roots are guaranteed to be invalid. And
it is not necessary to reset it again.
x86/kvm/lapic: always disable MMIO interface in x2APIC mode
When VMX is used with flexpriority disabled (because of no support or
if disabled with module parameter) MMIO interface to lAPIC is still
available in x2APIC mode while it shouldn't be (kvm-unit-tests):
PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
FAIL: apic_disable: *0xfee00030: 50014
The issue appears because we basically do nothing while switching to
x2APIC mode when APIC access page is not used. apic_mmio_{read,write}
only check if lAPIC is disabled before proceeding to actual write.
When APIC access is virtualized we correctly manipulate with VMX controls
in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes
in x2APIC mode so there's no issue.
Disabling MMIO interface seems to be easy. The question is: what do we
do with these reads and writes? If we add apic_x2apic_mode() check to
apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will
go to userspace. When lAPIC is in kernel, Qemu uses this interface to
inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This
somehow works with disabled lAPIC but when we're in xAPIC mode we will
get a real injected MSI from every write to lAPIC. Not good.
The simplest solution seems to be to just ignore writes to the region
and return ~0 for all reads when we're in x2APIC mode. This is what this
patch does. However, this approach is inconsistent with what currently
happens when flexpriority is enabled: we allocate APIC access page and
create KVM memory region so in x2APIC modes all reads and writes go to
this pre-allocated page which is, btw, the same for all vCPUs.
Jakub Kicinski [Tue, 18 Sep 2018 17:13:59 +0000 (10:13 -0700)]
tools: bpf: fix license for a compat header file
libc_compat.h is used by libbpf so make sure it's licensed under
LGPL or BSD license. The license change should be OK, I'm the only
author of the file.
Willem de Bruijn [Tue, 18 Sep 2018 20:20:18 +0000 (16:20 -0400)]
flow_dissector: fix build failure without CONFIG_NET
If boolean CONFIG_BPF_SYSCALL is enabled, kernel/bpf/syscall.c will
call flow_dissector functions from net/core/flow_dissector.c.
This causes this build failure if CONFIG_NET is disabled:
kernel/bpf/syscall.o: In function `__x64_sys_bpf':
syscall.c:(.text+0x3278): undefined reference to
`skb_flow_dissector_bpf_prog_attach'
syscall.c:(.text+0x3310): undefined reference to
`skb_flow_dissector_bpf_prog_detach'
kernel/bpf/syscall.o:(.rodata+0x3f0): undefined reference to
`flow_dissector_prog_ops'
kernel/bpf/verifier.o:(.rodata+0x250): undefined reference to
`flow_dissector_verifier_ops'
Analogous to other optional BPF program types in syscall.c, add stubs
if the relevant functions are not compiled and move the BPF_PROG_TYPE
definition in the #ifdef CONFIG_NET block.
xen: issue warning message when out of grant maptrack entries
When a driver domain (e.g. dom0) is running out of maptrack entries it
can't map any more foreign domain pages. Instead of silently stalling
the affected domUs issue a rate limited warning in this case in order
to make it easier to detect that situation.
Merge tag 'perf-urgent-for-mingo-4.19-20180918' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix the build on !_GNU_SOURCE libc systems such as Alpine Linux/musl
libc due to usage of strerror_r glibc variant on libbpf (Arnaldo Carvalho de Melo)
- Fix out-of-tree asciidoctor man page generation (Ben Hutchings)
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Crypto stuff from Herbert:
"This push fixes a potential boot hang in ccp and an incorrect
CPU capability check in aegis/morus on x86."
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2
crypto: ccp - add timeout support in the SEV command
Merge tag 'trace-v4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Steven writes:
"Vaibhav Nagarnaik found that modifying the ring buffer size could cause
a huge latency in the system because it does a while loop to free pages
without releasing the CPU (on non preempt kernels). In a case where there
are hundreds of thousands of pages to free it could actually cause a system
stall. A properly place cond_resched() solves this issue."
Merge tag 'platform-drivers-x86-v4.19-2' of git://git.infradead.org/linux-platform-drivers-x86
Darren writes:
"platform-drivers-x86 for v4.19-2
Free allocated ACPI buffers in two drivers.
The following is an automated git shortlog grouped by driver:
alienware-wmi:
- Correct a memory leak
dell-smbios-wmi:
- Correct a memory leak"
* tag 'platform-drivers-x86-v4.19-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: alienware-wmi: Correct a memory leak
platform/x86: dell-smbios-wmi: Correct a memory leak
====================
ipv6: fix issues on accessing fib6_metrics
The latest fix on the memory leak of fib6_metrics still causes
use-after-free.
This patch series first revert the previous fix and propose a new fix
that is more inline with ipv4 logic and is tested to fix the
use-after-free issue reported.
====================
Wei Wang [Tue, 18 Sep 2018 20:45:00 +0000 (13:45 -0700)]
ipv6: fix memory leak on dst->_metrics
When dst->_metrics and f6i->fib6_metrics share the same memory, both
take reference count on the dst_metrics structure. However, when dst is
destroyed, ip6_dst_destroy() only invokes dst_destroy_metrics_generic()
which does not take care of READONLY metrics and does not release refcnt.
This causes memory leak.
Similar to ipv4 logic, the fix is to properly release refcnt and free
the memory space pointed by dst->_metrics if refcnt becomes 0.
This change causes use-after-free on dst->_metrics.
The crash trace looks like this:
[ 97.763269] BUG: KASAN: use-after-free in ip6_mtu+0x116/0x140
[ 97.769038] Read of size 4 at addr ffff881781d2cf84 by task svw_NetThreadEv/8801
Russell King [Tue, 18 Sep 2018 15:48:53 +0000 (16:48 +0100)]
sfp: fix oops with ethtool -m
If a network interface is created prior to the SFP socket being
available, ethtool can request module information. This unfortunately
leads to an oops:
Unable to handle kernel NULL pointer dereference at virtual address 00000008
pgd = (ptrval)
[00000008] *pgd=7c400831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1480 Comm: ethtool Not tainted 4.19.0-rc3 #138
Hardware name: Broadcom Northstar Plus SoC
PC is at sfp_get_module_info+0x8/0x10
LR is at dev_ethtool+0x218c/0x2afc
Fix this by not filling in the network device's SFP bus pointer until
SFP is fully bound, thereby avoiding the core calling into the SFP bus
code.
Fixes: ce0aa27ff3f6 ("sfp: add sfp-bus to bridge between network devices and sfp cages") Reported-by: Florian Fainelli <[email protected]> Tested-by: Florian Fainelli <[email protected]> Signed-off-by: Russell King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Antoine Tenart [Tue, 18 Sep 2018 14:58:47 +0000 (16:58 +0200)]
net: mvpp2: fix a txq_done race condition
When no Tx IRQ is available, the txq_done() routine (called from
tx_done()) shouldn't be called from the polling function, as in such
case it is already called in the Tx path thanks to an hrtimer. This
mostly occurred when using PPv2.1, as the engine then do not have Tx
IRQs.
Fixes: edc660fa09e2 ("net: mvpp2: replace TX coalescing interrupts with hrtimer") Reported-by: Stefan Chulski <[email protected]> Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Comparing an int to a size, which is unsigned, causes the int to become
unsigned, giving the wrong result. kernel_sendmsg can return a negative
error code.
net/smc: enable fallback for connection abort in state INIT
If a linkgroup is terminated abnormally already due to failing
LLC CONFIRM LINK or LLC ADD LINK, fallback to TCP is still possible.
In this case do not switch to state SMC_PEERABORTWAIT and do not set
sk_err.
For a failing smc_listen_rdma_finish() smc_listen_decline() is
called. If fallback is possible, the new socket is already enqueued
to be accepted in smc_listen_decline(). Avoid enqueuing a second time
afterwards in this case, otherwise the smc_create_lgr_pending lock
is released twice:
[ 373.463976] WARNING: bad unlock balance detected!
[ 373.463978] 4.18.0-rc7+ #123 Tainted: G O
[ 373.463979] -------------------------------------
[ 373.463980] kworker/1:1/30 is trying to release lock (smc_create_lgr_pending) at:
[ 373.463990] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc]
[ 373.463991] but there are no more locks to release!
[ 373.463991]
other info that might help us debug this:
[ 373.463993] 2 locks held by kworker/1:1/30:
[ 373.463994] #0: 00000000772cbaed ((wq_completion)"events"){+.+.}, at: process_one_work+0x1ec/0x6b0
[ 373.464000] #1: 000000003ad0894a ((work_completion)(&new_smc->smc_listen_work)){+.+.}, at: process_one_work+0x1ec/0x6b0
[ 373.464003]
stack backtrace:
[ 373.464005] CPU: 1 PID: 30 Comm: kworker/1:1 Kdump: loaded Tainted: G O 4.18.0-rc7uschi+ #123
[ 373.464007] Hardware name: IBM 2827 H43 738 (LPAR)
[ 373.464010] Workqueue: events smc_listen_work [smc]
[ 373.464011] Call Trace:
[ 373.464015] ([<0000000000114100>] show_stack+0x60/0xd8)
[ 373.464019] [<0000000000a8c9bc>] dump_stack+0x9c/0xd8
[ 373.464021] [<00000000001dcaf8>] print_unlock_imbalance_bug+0xf8/0x108
[ 373.464022] [<00000000001e045c>] lock_release+0x114/0x4f8
[ 373.464025] [<0000000000aa87fa>] __mutex_unlock_slowpath+0x4a/0x300
[ 373.464027] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc]
[ 373.464029] [<0000000000197a68>] process_one_work+0x2a8/0x6b0
[ 373.464030] [<0000000000197ec2>] worker_thread+0x52/0x410
[ 373.464033] [<000000000019fd0e>] kthread+0x15e/0x178
[ 373.464035] [<0000000000aaf58a>] kernel_thread_starter+0x6/0xc
[ 373.464052] [<0000000000aaf584>] kernel_thread_starter+0x0/0xc
[ 373.464054] INFO: lockdep is turned off.
In state SMC_INIT smc_poll() delegates polling to the internal
CLC socket. This means, once the connect worker has finished
its kernel_connect() step, the poll wake-up may occur. This is not
intended. The wake-up should occur from the wake up call in
smc_connect_work() after __smc_connect() has finished.
Thus in state SMC_INIT this patch now calls sock_poll_wait() on the
main SMC socket.
EtherAVB hardware requires 0 to be written to status register bits in
order to clear them, however, care must be taken not to:
1. Clear other bits, by writing zero to them
2. Write one to reserved bits
This patch corrects the ravb driver with respect to the second point above.
This is done by defining reserved bit masks for the affected registers and,
after auditing the code, ensure all sites that may write a one to a
reserved bit use are suitably masked.