Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"The main change here is a revert of reverts. We recently simplified
some code that was thought unnecessary; however, since then KVM has
grown quite a few cond_resched()s and for that reason the simplified
code is prone to livelocks---one CPUs tries to empty a list of guest
page tables while the others keep adding to them. This adds back the
generation-based zapping of guest page tables, which was not
unnecessary after all.
On top of this, there is a fix for a kernel memory leak and a couple
of s390 fixlets as well"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
KVM: x86: work around leak of uninitialized stack contents
KVM: nVMX: handle page fault in vmread
KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
Merge tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fix from Paul Walmsley:
"Last week, Palmer and I learned that there was an error in the RISC-V
kernel image header format that could make it less compatible with the
ARM64 kernel image header format. I had missed this error during my
original reviews of the patch.
The kernel image header format is an interface that impacts
bootloaders, QEMU, and other user tools. Those packages must be
updated to align with whatever is merged in the kernel. We would like
to avoid proliferating these image formats by keeping the RISC-V
header as close as possible to the existing ARM64 header. Since the
arch/riscv patch that adds support for the image header was merged
with our v5.3-rc1 pull request as commit 0f327f2aaad6a ("RISC-V: Add
an Image header that boot loader can parse."), we think it wise to try
to fix this error before v5.3 is released.
The fix itself should be backwards-compatible with any project that
has already merged support for premature versions of this interface.
It primarily involves ensuring that the RISC-V image header has
something useful in the same field as the ARM64 image header"
* tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: modify the Image header to improve compatibility with the ARM64 header
1) Don't corrupt xfrm_interface parms before validation, from Nicolas
Dichtel.
2) Revert use of usb-wakeup in btusb, from Mario Limonciello.
3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from
Leonardo Bras.
4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso.
5) Missing ULP check in sock_map, from John Fastabend.
6) Fix receive statistic handling in forcedeth, from Zhu Yanjun.
7) Fix length of SKB allocated in 6pack driver, from Christophe
JAILLET.
8) ip6_route_info_create() returns an error pointer, not NULL. From
Maciej Żenczykowski.
9) Only add RDS sock to the hashes after rs_transport is set, from
Ka-Cheong Poon.
10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets.
11) Presence of transmit IPSEC offload in an SKB is not tested for
correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff
Kirsher.
12) Need rcu_barrier() when register_netdevice() takes one of the
notifier based failure paths, from Subash Abhinov Kasiviswanathan.
13) Fix leak in sctp_do_bind(), from Mao Wenan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
cdc_ether: fix rndis support for Mediatek based smartphones
sctp: destroy bucket if failed to bind addr
sctp: remove redundant assignment when call sctp_get_port_local
sctp: change return type of sctp_get_port_local
ixgbevf: Fix secpath usage for IPsec Tx offload
sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
ixgbe: Fix secpath usage for IPsec TX offload.
net: qrtr: fix memort leak in qrtr_tun_write_iter
net: Fix null de-reference of device refcount
ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
tun: fix use-after-free when register netdev failed
tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
ixgbe: fix double clean of Tx descriptors with xdp
ixgbe: Prevent u8 wrapping of ITR value to something less than 10us
mlx4: fix spelling mistake "veify" -> "verify"
net: hns3: fix spelling mistake "undeflow" -> "underflow"
net: lmc: fix spelling mistake "runnin" -> "running"
NFC: st95hf: fix spelling mistake "receieve" -> "receive"
net/rds: An rds_sock is added too early to the hash table
mac80211: Do not send Layer 2 Update frame before authorization
...
Merge tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"From the maintainer summit, just some last minute fixes for final:
lima:
- fix gem_wait ioctl
core:
- constify modes list
i915:
- DP MST high color depth regression
- GPU hangs on vulkan compute workloads"
* tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm:
drm/lima: fix lima_gem_wait() return value
drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+
drm/i915: Limit MST to <= 8bpc once again
drm/modes: Make the whitelist more const
KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
James Harvey reported a livelock that was introduced by commit d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when
removing a memslot"").
The livelock occurs because kvm_mmu_zap_all() as it exists today will
voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
to add shadow pages. With enough vCPUs, kvm_mmu_zap_all() can get stuck
in an infinite loop as it can never zap all pages before observing lock
contention or the need to reschedule. The equivalent of kvm_mmu_zap_all()
that was in use at the time of the reverted commit (4e103134b8623, "KVM:
x86/mmu: Zap only the relevant pages when removing a memslot") employed
a fast invalidate mechanism and was not susceptible to the above livelock.
There are three ways to fix the livelock:
- Reverting the revert (commit d012a06ab1d23) is not a viable option as
the revert is needed to fix a regression that occurs when the guest has
one or more assigned devices. It's unlikely we'll root cause the device
assignment regression soon enough to fix the regression timely.
- Remove the conditional reschedule from kvm_mmu_zap_all(). However, although
removing the reschedule would be a smaller code change, it's less safe
in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
the wild for flushing memslots since the fast invalidate mechanism was
introduced by commit 6ca18b6950f8d ("KVM: x86: use the fast way to
invalidate all pages"), back in 2013.
- Reintroduce the fast invalidate mechanism and use it when zapping shadow
pages in response to a memslot being deleted/moved, which is what this
patch does.
For all intents and purposes, this is a revert of commit ea145aacf4ae8
("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
commit 7390de1e99a70 ("Revert "KVM: x86: use the fast way to invalidate
all pages""), i.e. restores the behavior of commit 5304b8d37c2a5 ("KVM:
MMU: fast invalidate all pages") and commit 6ca18b6950f8d ("KVM: x86:
use the fast way to invalidate all pages") respectively.
KVM: x86: work around leak of uninitialized stack contents
Emulation of VMPTRST can incorrectly inject a page fault
when passed an operand that points to an MMIO address.
The page fault will use uninitialized kernel stack memory
as the CR2 and error code.
The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
exit to userspace; however, it is not an easy fix, so for now just ensure
that the error code and CR2 are zero.
Paul Walmsley [Sat, 14 Sep 2019 01:35:50 +0000 (18:35 -0700)]
riscv: modify the Image header to improve compatibility with the ARM64 header
Part of the intention during the definition of the RISC-V kernel image
header was to lay the groundwork for a future merge with the ARM64
image header. One error during my original review was not noticing
that the RISC-V header's "magic" field was at a different size and
position than the ARM64's "magic" field. If the existing ARM64 Image
header parsing code were to attempt to parse an existing RISC-V kernel
image header format, it would see a magic number 0. This is
undesirable, since it's our intention to align as closely as possible
with the ARM64 header format. Another problem was that the original
"res3" field was not being initialized correctly to zero.
Address these issues by creating a 32-bit "magic2" field in the RISC-V
header which matches the ARM64 "magic" field. RISC-V binaries will
store "RSC\x05" in this field. The intention is that the use of the
existing 64-bit "magic" field in the RISC-V header will be deprecated
over time. Increment the minor version number of the file format to
indicate this change, and update the documentation accordingly. Fix
the assembler directives in head.S to ensure that reserved fields are
properly zero-initialized.
====================
net: devlink: move reload fail indication to devlink core and expose to user
First two patches are dependencies of the last one. That moves devlink
reload failure indication to the devlink code, so the drivers do not
have to track it themselves. Currently it is only mlxsw, but I will send
a follow-up patchset that introduces this in netdevsim too.
====================
cdc_ether: fix rndis support for Mediatek based smartphones
A Mediatek based smartphone owner reports problems with USB
tethering in Linux. The verbose USB listing shows a rndis_host
interface pair (e0/01/03 + 10/00/00), but the driver fails to
bind with
[ 355.960428] usb 1-4: bad CDC descriptors
The problem is a failsafe test intended to filter out ACM serial
functions using the same 02/02/ff class/subclass/protocol as RNDIS.
The serial functions are recognized by their non-zero bmCapabilities.
No RNDIS function with non-zero bmCapabilities were known at the time
this failsafe was added. But it turns out that some Wireless class
RNDIS functions are using the bmCapabilities field. These functions
are uniquely identified as RNDIS by their class/subclass/protocol, so
the failing test can safely be disabled. The same applies to the two
types of Misc class RNDIS functions.
Applying the failsafe to Communication class functions only retains
the original functionality, and fixes the problem for the Mediatek based
smartphone.
Tow examples of CDC functional descriptors with non-zero bmCapabilities
from Wireless class RNDIS functions are:
0e8d:000a Mediatek Crosscall Spider X5 3G Phone
CDC Header:
bcdCDC 1.10
CDC ACM:
bmCapabilities 0x0f
connection notifications
sends break
line coding and serial state
get/set/clear comm features
CDC Union:
bMasterInterface 0
bSlaveInterface 1
CDC Call Management:
bmCapabilities 0x03
call management
use DataInterface
bDataInterface 1
and
19d2:1023 ZTE K4201-z
CDC Header:
bcdCDC 1.10
CDC ACM:
bmCapabilities 0x02
line coding and serial state
CDC Call Management:
bmCapabilities 0x03
call management
use DataInterface
bDataInterface 1
CDC Union:
bMasterInterface 0
bSlaveInterface 1
The Mediatek example is believed to apply to most smartphones with
Mediatek firmware. The ZTE example is most likely also part of a larger
family of devices/firmwares.
David S. Miller [Fri, 13 Sep 2019 20:06:20 +0000 (22:06 +0200)]
Merge branch 'sctp_do_bind-leak'
Mao Wenan says:
====================
fix memory leak for sctp_do_bind
First two patches are to do cleanup, remove redundant assignment,
and change return type of sctp_get_port_local.
Third patch is to fix memory leak for sctp_do_bind if failed
to bind address.
v2: add one patch to change return type of sctp_get_port_local.
====================
This is because in sctp_do_bind, if sctp_get_port_local is to
create hash bucket successfully, and sctp_add_bind_addr failed
to bind address, e.g return -ENOMEM, so memory leak found, it
needs to destroy allocated bucket.
Mao Wenan [Thu, 12 Sep 2019 04:02:17 +0000 (12:02 +0800)]
sctp: change return type of sctp_get_port_local
Currently sctp_get_port_local() returns a long
which is either 0,1 or a pointer casted to long.
It's neither of the callers use the return value since
commit 62208f12451f ("net: sctp: simplify sctp_get_port").
Now two callers are sctp_get_port and sctp_do_bind,
they actually assumend a casted to an int was the same as
a pointer casted to a long, and they don't save the return
value just check whether it is zero or non-zero, so
it would better change return type from long to int for
sctp_get_port_local.
Willem de Bruijn [Wed, 11 Sep 2019 19:50:51 +0000 (15:50 -0400)]
ip: support SO_MARK cmsg
Enable setting skb->mark for UDP and RAW sockets using cmsg.
This is analogous to existing support for TOS, TTL, txtime, etc.
Packet sockets already support this as of commit c7d39e32632e
("packet: support per-packet fwmark for af_packet sendmsg").
Similar to other fields, implement by
1. initialize the sockcm_cookie.mark from socket option sk_mark
2. optionally overwrite this in ip_cmsg_send/ip6_datagram_send_ctl
3. initialize inet_cork.mark from sockcm_cookie.mark
4. initialize each (usually just one) skb->mark from inet_cork.mark
Step 1 is handled in one location for most protocols by ipcm_init_sk
as of commit 351782067b6b ("ipv4: ipcm_cookie initializers").
Michael Straube [Tue, 10 Sep 2019 19:04:22 +0000 (21:04 +0200)]
rtlwifi: rtl8192de: replace _rtl92d_evm_db_to_percentage with generic version
Function _rtl92d_evm_db_to_percentage is functionally identical
to the generic version rtl_evm_db_to_percentage, so remove
_rtl92d_evm_db_to_percentage and use the generic version instead.
Michael Straube [Tue, 10 Sep 2019 19:04:21 +0000 (21:04 +0200)]
rtlwifi: rtl8192cu: replace _rtl92c_evm_db_to_percentage with generic version
Function _rtl92c_evm_db_to_percentage is functionally identical
to the generic version rtl_evm_db_to_percentage, so remove
_rtl92c_evm_db_to_percentage and use the generic version instead.
Michael Straube [Tue, 10 Sep 2019 19:04:20 +0000 (21:04 +0200)]
rtlwifi: rtl8192ce: replace _rtl92c_evm_db_to_percentage with generic version
Function _rtl92c_evm_db_to_percentage is functionally identical
to the generic version rtl_evm_db_to_percentage, so remove
_rtl92c_evm_db_to_percentage and use the generic version instead.
This mechanism reduces the numbers of false alram in cck rate by
dynamically adjusting the value of power threshold and cs_ratio.
We determine the new value by three factors, which are rssi, false alarm
count and igi. Based on these factors, we define the current condition
into five levels. Compared to the previous level, if the level is changed,
we set the new values for power threshold and cs_ratio.
Since 8822c requires to do not only IQK, but also DPK.
Move these calibrations that need to be done once the channel
is determined, into phy_calibration.
And note that the order of the calibrations matters, 8822c
should do IQK first, then DPK.
Power amplifiers are not linear components, and require DPK to
reduce its nonlinearity. DPK is called Digital Pre-distortion
Calibration, can be used to compensate the output of power.
DPK tracking is in charge of tracking the thermal changes. And
it then shifts the power curve accordingly, which makes the
power output remains linear even if the PA works in different
temperature.
To perform DPK, the parameter table should also be updated.
And the table will be applied when device is powered on.
Then DPK will reference the values to calibrate.
Ideally the RF component's I/Q vectors should be orthogonal,
but usually they are not. So we need to calibrate for the RF
components, ex. PA/LNA, ADC/DAC.
And if the I/Q vectors are more orthogonal, the mixed signal
will have less deviation. This helps with those rates with
higher modulation (MCS8-9), because they have more strict
EVM/SNR requirement. Also the better of the quality of the
signal, the longer it can propagate, and the better throughput
performance we can get.
Tsang-Shian Lin [Mon, 9 Sep 2019 07:16:06 +0000 (15:16 +0800)]
rtw88: 8822c: Enable interrupt migration
Enable 8822C Tx/Rx interrupt migration.
In some platforms, performance test may cause heavy cpu loading and get
bad results. Interrupt migration can decrease the amount of interrupts,
and lower cpu loading.
Larry Finger [Mon, 9 Sep 2019 01:59:57 +0000 (20:59 -0500)]
rtlwifi: rtl8723be: Convert inline routines to little-endian words
In this step, the read/write routines for the descriptors are converted
to use __le32 quantities, thus a lot of casts can be removed. Callback
routines still use the 8-bit arrays, but these are changed within the
specified routine.
The macro that cleared a descriptor has now been converted into an inline
routine.
Larry Finger [Mon, 9 Sep 2019 01:59:56 +0000 (20:59 -0500)]
rtlwifi: rtl8723be: Convert macros that set descriptor
As a first step in the conversion, the macros that set the RX and TX
descriptors are converted to static inline routines, and the names are
changed from upper to lower case. To minimize the changes in a given
step, the input descriptor information is left as as a byte array
(u8 *), even though it should be a little-endian word array (__le32 *).
That will be changed in the next patch.
Several places where checkpatch.pl complains about a space after a cast
are fixed.
Larry Finger [Mon, 9 Sep 2019 01:59:55 +0000 (20:59 -0500)]
rtlwifi: rtl8723be: Replace local bit manipulation macros
This driver uses a set of local macros to manipulate the RX and TX
descriptors, which are all little-endian quantities. These macros
are replaced by the bitfield macros le32p_replace_bits() and
le32_get_bits(). In several places, the macros operated on an entire
32-bit word. In these cases, a direct read or replacement is used.
Larry Finger [Mon, 9 Sep 2019 01:59:53 +0000 (20:59 -0500)]
rtlwifi: rtl8723ae: Convert inline routines to little-endian words
In this step, the read/write routines for the descriptors are converted
to use __le32 quantities, thus a lot of casts can be removed. Callback
routines still use the 8-bit arrays, but these are changed within the
specified routine.
The macro that cleared a descriptor has now been converted into an inline
routine.
Larry Finger [Mon, 9 Sep 2019 01:59:52 +0000 (20:59 -0500)]
rtlwifi: rtl8723ae: Convert macros that set descriptor
As a first step in the conversion, the macros that set the RX and TX
descriptors are converted to static inline routines, and the names are
changed from upper to lower case. To minimize the changes in a given
step, the input descriptor information is left as as a byte array
(u8 *), even though it should be a little-endian word array (__le32 *).
That will be changed in the next patch.
Several places where checkpatch.pl reports lines too long are fixed.
Larry Finger [Mon, 9 Sep 2019 01:59:51 +0000 (20:59 -0500)]
rtlwifi: rtl8723ae: Replace local bit manipulation macros
This driver uses a set of local macros to manipulate the RX and TX
descriptors, which are all little-endian quantities. These macros
are replaced by the bitfield macros le32p_replace_bits() and
le32_get_bits(). In several places, the macros operated on an entire
32-bit word. In these cases, a direct read or replacement is used.
libertas: use mesh_wdev->ssid instead of priv->mesh_ssid
With the commit e86dc1ca4676 ("Libertas: cfg80211 support") we've lost
the ability to actually set the Mesh SSID from userspace.
NL80211_CMD_SET_INTERFACE with NL80211_ATTR_MESH_ID sets the mesh point
interface's ssid field. Let's use that one for the Libertas Mesh
operation
Felipe Balbi [Wed, 11 Sep 2019 06:16:22 +0000 (09:16 +0300)]
PTP: add support for one-shot output
Some controllers allow for a one-shot output pulse, in contrast to
periodic output. Now that we have extensible versions of our IOCTLs, we
can finally make use of the 'flags' field to pass a bit telling driver
that if we want one-shot pulse output.
Felipe Balbi [Wed, 11 Sep 2019 06:16:21 +0000 (09:16 +0300)]
PTP: introduce new versions of IOCTLs
The current version of the IOCTL have a small problem which prevents us
from extending the API by making use of reserved fields. In these new
IOCTLs, we are now making sure that flags and rsv fields are zero which
will allow us to extend the API in the future.
Jeff Kirsher [Thu, 12 Sep 2019 19:07:34 +0000 (12:07 -0700)]
ixgbevf: Fix secpath usage for IPsec Tx offload
Port the same fix for ixgbe to ixgbevf.
The ixgbevf driver currently does IPsec Tx offloading
based on an existing secpath. However, the secpath
can also come from the Rx side, in this case it is
misinterpreted for Tx offload and the packets are
dropped with a "bad sa_idx" error. Fix this by using
the xfrm_offload() function to test for Tx offload.
David S. Miller [Fri, 13 Sep 2019 13:50:48 +0000 (15:50 +0200)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-09-12
This series contains updates to ice driver to implement and support
loading a Dynamic Device Personalization (DDP) package from lib/firmware
onto the device.
Paul updates the way the driver version is stored in the driver so that
we can pass the driver version to the firmware. Passing of the driver
version to the firmware is needed for the DDP package to ensure we have
the appropriate support in the driver for the features in the package.
Lukasz fixes how the firmware version is stored to align with how the
firmware stores its own version. Also extended the log message to
display additional useful information such as NVM version, API patch
information and firmware build hash.
Tony adds the needed driver support to check, load and store the DDP
package. Also add support for the ability to load DDP packages intended
for specific hardware devices, as well as what to do when loading of the
DDP package fails to load.
====================
David S. Miller [Fri, 13 Sep 2019 13:46:38 +0000 (15:46 +0200)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2019-09-11
This series contains updates to i40e, ixgbe/vf and iavf.
Wenwen Wang fixes a potential memory leak where 3 allocated variables
are not properly cleaned up on failure for ixgbe.
Stefan Assmann fixes a potential kernel panic found when repeatedly
spawning and destroying VFs in i40e when a NULL pointer is dereferenced
due to a race condition. Fixed up the i40e driver to clear the
__I40E_VIRTCHNL_OP_PENDING bit before returning after an invalid
minimum transmit rate is requested. Updates the iavf driver to only
apply the MAC address change when the PF ACK's the requested change.
Tonghao Zhang updates ixgbe to use the skb_get_queue_mapping() API call
instead of the driver accessing the queue mapping directly.
Jake updates i40e to use ktime_get_real_ts64() instead of
ktime_to_timespec64(). Removes the define for bit 0x0001 for cloud
filters, since it is a reserved bit and not a valid type. Also added
code comments to clearly state which bits are reserved and should not be
used or defined for cloud filter adminq command. Clarify the macros
used to specify the cloud filter fields are individual bits, so use the
BIT() macro.
Aleksandr fixes up the print_link_message() to include the "negotiated"
FEC status for i40e.
Czeslaw also adds additional log message for devices without FEC in the
print_link_message() for i40e.
Colin Ian King reduces the object code size by making the array API
static constant.
Magnus fixes a potential receive buffer starvation issue for AF_XDP by
kicking the NAPI context of any queue with an attached AF_XDP zero-copy
socket.
v2: Removed patch 11 from the original series (Alex Duyck's ITR fix),
so that it can be sent to the net tree.
====================
Colin Ian King [Thu, 5 Sep 2019 15:00:22 +0000 (16:00 +0100)]
rtlwifi: rtl8821ae: make array static const and remove redundant assignment
The array channel_all can be make static const rather than populating
it on the stack, this makes the code smaller. Also, variable place
is being initialized with a value that is never read, so this assignment
is redundant and can be removed.
Before:
text data bss dec hex filename
118537 9591 0 128128 1f480 realtek/rtlwifi/rtl8821ae/phy.o
After:
text data bss dec hex filename
118331 9687 0 128018 1f412 realtek/rtlwifi/rtl8821ae/phy.o
Providing a new wiphy on every PCIe reset was confusing and was causing
configuration problems for some users (supplicant and authenticators).
Sticking to the existing wiphy should make error recovery much simpler
and more reliable.
brcmfmac: split brcmf_attach() and brcmf_detach() functions
Move code allocating/freeing wiphy out of above functions. This will
allow reinitializing the driver (e.g. on some error) without allocating
a new wiphy.
brcmfmac: move "cfg80211_ops" pointer to another struct
This moves "ops" pointer from "struct brcmf_cfg80211_info" to the
"struct brcmf_pub". This movement makes it possible to allocate wiphy
without attaching cfg80211 (brcmf_cfg80211_attach()). It's required for
later separation of wiphy allocation and driver initialization.
While at it fix also an unlikely memory leak in the brcmf_attach().
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Fix error path of nf_tables_updobj(), from Dan Carpenter.
2) Move large structure away from stack in the nf_tables offload
infrastructure, from Arnd Bergmann.
3) Move indirect flow_block logic to nf_tables_offload.
4) Support for synproxy objects, from Fernando Fernandez Mancera.
5) Support for fwd and dup offload.
6) Add __nft_offload_get_chain() helper, this implicitly fixes missing
mutex and check for offload flags in the indirect block support,
patch from wenxu.
7) Remove rules on device unregistration, from wenxu. This includes
two preparation patches to reuse nft_flow_offload_chain() and
nft_flow_offload_rule().
Large batch from Jeremy Sowden to make a second pass to the
CONFIG_HEADER_TEST support and a bit of housekeeping:
8) Missing include guard in conntrack label header, from Jeremy Sowden.
9) A few coding style errors: trailing whitespace, incorrect indent in
Kconfig, and semicolons at the end of function definitions.
10) Remove unused ipt_init() and ip6t_init() declarations.
11) Inline xt_hashlimit, ebt_802_3 and xt_physdev headers. They are
only used once.
12) Update include directive in several netfilter files.
14) Move nf_ip6_ext_hdr() to include/linux/netfilter_ipv6.h
15) Move several synproxy structure definitions to nf_synproxy.h
16) Move nf_bridge_frag_data structure to include/linux/netfilter_bridge.h
17) Clean up static inline definitions in nf_conntrack_ecache.h.
18) Replace defined(CONFIG...) || defined(CONFIG...MODULE) with IS_ENABLED(CONFIG...).
19) Missing inline function conditional definitions based on Kconfig
preferences in synproxy and nf_conntrack_timeout.
20) Update br_nf_pre_routing_ipv6() definition.
21) Move conntrack code in linux/skbuff.h to nf_conntrack headers.
22) Several patches to remove superfluous CONFIG_NETFILTER and
CONFIG_NF_CONNTRACK checks in headers, coming from the initial batch
support for CONFIG_HEADER_TEST for netfilter.
====================
mmc: tmio: Fixup runtime PM management during probe
The tmio_mmc_host_probe() calls pm_runtime_set_active() to update the
runtime PM status of the device, as to make it reflect the current status
of the HW. This works fine for most cases, but unfortunate not for all.
Especially, there is a generic problem when the device has a genpd attached
and that genpd have the ->start|stop() callbacks assigned.
More precisely, if the driver calls pm_runtime_set_active() during
->probe(), genpd does not get to invoke the ->start() callback for it,
which means the HW isn't really fully powered on. Furthermore, in the next
phase, when the device becomes runtime suspended, genpd will invoke the
->stop() callback for it, potentially leading to usage count imbalance
problems, depending on what's implemented behind the callbacks of course.
To fix this problem, convert to call pm_runtime_get_sync() from
tmio_mmc_host_probe() rather than pm_runtime_set_active(). Additionally, to
avoid bumping usage counters and unnecessary re-initializing the HW the
first time the tmio driver's ->runtime_resume() callback is called,
introduce a state flag to keeping track of this.
It turns out that the above commit introduces other problems. For example,
calling pm_runtime_set_active() must not be done prior calling
pm_runtime_enable() as that makes it fail. This leads to additional
problems, such as clock enables being wrongly balanced.
Rather than fixing the problem on top, let's start over by doing a revert.
Jeremy Sowden [Fri, 13 Sep 2019 08:13:16 +0000 (09:13 +0100)]
netfilter: remove CONFIG_NETFILTER checks from headers.
`struct nf_hook_ops`, `struct nf_hook_state` and the `nf_hookfn`
function typedef appear in function and struct declarations and
definitions in a number of netfilter headers. The structs and typedef
themselves are defined by linux/netfilter.h but only when
CONFIG_NETFILTER is enabled. Define them unconditionally and add
forward declarations in order to remove CONFIG_NETFILTER conditionals
from the other headers.
Jeremy Sowden [Fri, 13 Sep 2019 08:13:14 +0000 (09:13 +0100)]
netfilter: conntrack: move code to linux/nf_conntrack_common.h.
Move some `struct nf_conntrack` code from linux/skbuff.h to
linux/nf_conntrack_common.h. Together with a couple of helpers for
getting and setting skb->_nfct, it allows us to remove
CONFIG_NF_CONNTRACK checks from net/netfilter/nf_conntrack.h.
Jeremy Sowden [Fri, 13 Sep 2019 08:13:13 +0000 (09:13 +0100)]
netfilter: br_netfilter: update stub br_nf_pre_routing_ipv6 parameter to `void *priv`.
The real br_nf_pre_routing_ipv6 function, defined when CONFIG_IPV6 is
enabled, expects `void *priv`, not `const struct nf_hook_ops *ops`.
Update the stub br_nf_pre_routing_ipv6, defined when CONFIG_IPV6 is
disabled, to match.
Fixes: 06198b34a3e0 ("netfilter: Pass priv instead of nf_hook_ops to netfilter hooks") Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Jeremy Sowden [Fri, 13 Sep 2019 08:13:12 +0000 (09:13 +0100)]
netfilter: conntrack: wrap two inline functions in config checks.
nf_conntrack_synproxy.h contains three inline functions. The contents
of two of them are wrapped in CONFIG_NETFILTER_SYNPROXY checks and just
return NULL if it is not enabled. The third does nothing if they return
NULL, so wrap its contents as well.
nf_ct_timeout_data is only called if CONFIG_NETFILTER_TIMEOUT is
enabled. Wrap its contents in a CONFIG_NETFILTER_TIMEOUT check like the
other inline functions in nf_conntrack_timeout.h.
Jeremy Sowden [Fri, 13 Sep 2019 08:13:09 +0000 (09:13 +0100)]
netfilter: move nf_bridge_frag_data struct definition to a more appropriate header.
There is a struct definition function in nf_conntrack_bridge.h which is
not specific to conntrack and is used elswhere in netfilter. Move it
into netfilter_bridge.h.
Jeremy Sowden [Fri, 13 Sep 2019 08:13:07 +0000 (09:13 +0100)]
netfilter: move inline nf_ip6_ext_hdr() function to a more appropriate header.
There is an inline function in ip6_tables.h which is not specific to
ip6tables and is used elswhere in netfilter. Move it into
netfilter_ipv6.h and update the callers.
Jeremy Sowden [Fri, 13 Sep 2019 08:13:06 +0000 (09:13 +0100)]
netfilter: remove nf_conntrack_icmpv6.h header.
nf_conntrack_icmpv6.h contains two object macros which duplicate macros
in linux/icmpv6.h. The latter definitions are also visible wherever it
is included, so remove it.
Jeremy Sowden [Fri, 13 Sep 2019 08:13:02 +0000 (09:13 +0100)]
netfilter: fix coding-style errors.
Several header-files, Kconfig files and Makefiles have trailing
white-space. Remove it.
In netfilter/Kconfig, indent the type of CONFIG_NETFILTER_NETLINK_ACCT
correctly.
There are semicolons at the end of two function definitions in
include/net/netfilter/nf_conntrack_acct.h and
include/net/netfilter/nf_conntrack_ecache.h. Remove them.
Merge tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Here are two fixes, one of them urgent fixing a bug introduced in 5.2
and reported by many users. It took time to identify the root cause,
catching the 5.3 release is higly desired also to push the fix to 5.2
stable tree.
The bug is a mess up of return values after adding proper error
handling and honestly the kind of bug that can cause sleeping
disorders until it's caught. My appologies to everybody who was
affected.
Summary of what could happen:
1) either a hang when committing a transaction, if this happens
there's no risk of corruption, still the hang is very inconvenient
and can't be resolved without a reboot
2) writeback for some btree nodes may never be started and we end up
committing a transaction without noticing that, this is really
serious and that will lead to the "parent transid verify failed"
messages"
* tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
Btrfs: fix assertion failure during fsync and use of stale transaction
netfilter: nf_tables_offload: add __nft_offload_get_chain function
Add __nft_offload_get_chain function to get basechain from device. This
function requires that caller holds the per-netns nftables mutex. This
patch implicitly fixes missing offload flags check and proper mutex from
nft_indr_block_cb().
Fixes: 9a32669fecfb ("netfilter: nf_tables_offload: support indr block call") Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Roman Gushchin [Thu, 12 Sep 2019 17:56:45 +0000 (10:56 -0700)]
cgroup: freezer: fix frozen state inheritance
If a new child cgroup is created in the frozen cgroup hierarchy
(one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup
flag should be set. Otherwise if a process will be attached to the
child cgroup, it won't become frozen.
The problem can be reproduced with the test_cgfreezer_mkdir test.
This is the output before this patch:
~/test_freezer
ok 1 test_cgfreezer_simple
ok 2 test_cgfreezer_tree
ok 3 test_cgfreezer_forkbomb
Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen
not ok 4 test_cgfreezer_mkdir
ok 5 test_cgfreezer_rmdir
ok 6 test_cgfreezer_migrate
ok 7 test_cgfreezer_ptrace
ok 8 test_cgfreezer_stopped
ok 9 test_cgfreezer_ptraced
ok 10 test_cgfreezer_vfork
And with this patch:
~/test_freezer
ok 1 test_cgfreezer_simple
ok 2 test_cgfreezer_tree
ok 3 test_cgfreezer_forkbomb
ok 4 test_cgfreezer_mkdir
ok 5 test_cgfreezer_rmdir
ok 6 test_cgfreezer_migrate
ok 7 test_cgfreezer_ptrace
ok 8 test_cgfreezer_stopped
ok 9 test_cgfreezer_ptraced
ok 10 test_cgfreezer_vfork
Roman Gushchin [Thu, 12 Sep 2019 17:56:44 +0000 (10:56 -0700)]
kselftests: cgroup: add freezer mkdir test
Add a new cgroup freezer selftest, which checks that if a cgroup is
frozen, their new child cgroups will properly inherit the frozen
state.
It creates a parent cgroup, freezes it, creates a child cgroup
and populates it with a dummy process. Then it checks that both
parent and child cgroup are frozen.
Tony Nguyen [Mon, 9 Sep 2019 13:47:46 +0000 (06:47 -0700)]
ice: Enable DDP package download
Attempt to request an optional device-specific DDP package file
(one with the PCIe Device Serial Number in its name so that different DDP
package files can be used on different devices). If the optional package
file exists, download it to the device. If not, download the default
package file.
Log an appropriate message based on whether or not a DDP package
file exists and the return code from the attempt to download it to the
device. If the download fails and there is not already a package file on
the device, go into "Safe Mode" where some features are not supported.
Tony Nguyen [Mon, 9 Sep 2019 13:47:45 +0000 (06:47 -0700)]
ice: Initialize DDP package structures
Add functions to initialize, parse, and clean structures representing
the DDP package.
Upon completion of package download, read and store the DDP package
contents to these structures. This configuration is used to
identify the default behavior and later used to update the HW table
entries.
Add the required defines, structures, and functions to enable downloading
a DDP package. Before download, checks are performed to ensure the package
is valid and compatible.
Note that package download is not yet requested by the driver as further
initialization is required to utilize the package.
The FW build id is currently being displayed as an int which doesn't make
sense. Instead display FW build id as a hex value. Also add other useful
information to the output such as NVM version, API patch info, and FW
build hash.
The driver is required to send a version to the firmware
to indicate that the driver is up. If the driver doesn't
do this the firmware doesn't behave properly.
Lior David [Tue, 10 Sep 2019 13:46:43 +0000 (16:46 +0300)]
wil6210: ignore reset errors for FW during probe
There are special kinds of FW such as WMI only which
are used for testing, diagnostics and other specific
scenario.
Such FW is loaded during driver probe and the driver
disallows enabling any network interface, to avoid
operational issues.
In many cases it is used to debug early versions
of FW with new features, which sometimes fail
on startup.
Currently when such FW fails to load (for example,
because of init failure), the driver probe would fail
and shutdown the device making it difficult to debug
the early failure.
To fix this, ignore load failures in WMI only FW and
allow driver probe to succeed, making it possible to
continue and debug the FW load failure.
Lior David [Tue, 10 Sep 2019 13:46:41 +0000 (16:46 +0300)]
wil6210: fix RX short frame check
The short frame check in wil_sring_reap_rx_edma uses
skb->len which store the maximum frame length. Fix
this to use dmalen which is the actual length of
the received frame.