Git Repo - linux.git/log

iwlwifi: add TX queue size parameter to TX queue allocation

As preparation for dynamic queue sizing, add a parameter
of the TX queue size to the dynamic queue allocation op
mode API.

Signed-off-by: Sara Sharon <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: Revert "iwlwifi: pcie: dynamic Tx command queue size"

This reverts commit dd05f9aab4426ff178b12d601e50d19d336eba30.

Shorter TX queues support was added eventually without the
need for the parameters this patch added.

Signed-off-by: Sara Sharon <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: pcie: allocate shorter TX queues for 22000 devices

When support for shorter TX queues was introduced, it
didn't include the actual allocation of shorter queue,
which is the main motive for the change.

Signed-off-by: Sara Sharon <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: move timestamp functions from debugfs.h to dbg.h

These functions are not debugfs functions so they should be in dbg.h
instad in debugfs.h.

Signed-off-by: Haim Dreyfuss <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: pcie: implement the overlow queue for Gen2 devices

When we enable TSO, we can have a lot of packets in the
operation mode that will be pushed to the transport
no matter what is the queue's fullness state.

To cope with that the transport can buffer those packets
and add them to the ring later when there is more room.
This implementation was missing in the Gen2 devices'
code.

Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: mvm: support offload of AMSDU rate control

Support the new APIs and activate AMSDU based on the
offloaded TLC decisions.

Signed-off-by: Sara Sharon <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: mvm: fix OOC priority in scans

The code that sets the correct out-of-channel priority depending on
the scan type was accidentally removed during a rebase. Add it back.

Fixes: c1a7515393e4 ("iwlwifi: mvm: add adaptive dwell support")
Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: mvm: clean up scan capability checks

Introduce and use iwl_mvm_cdb_scan_api(), which checks the family.
Most of this will go away once the 22000 firmware supports adaptive
dwell, after which the V6 scan API won't be used, but the V3 scan
*config* API will still need to be distinguished.

In any case, this gets rid of the completely bogus has_new_tx_api()
checks.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: mvm: detect low latency and traffic load per band

Detect low latency and traffic load per band. Add support for
deciding on scan type and timings per band.

Signed-off-by: Sara Sharon <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: mvm: detect U-APSD breaking aggregation

Try to detect that the AP is not using aggregation even when there's
enough traffic to make it worthwhile; if this is the case and U-APSD
is enabled then assume the AP is broken (like so many) and doesn't
enable aggregation when U-APSD is used. In this case, disconnect from
the AP and blacklist U-APSD for a potential new connection to it.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: mvm: BT Coex - make the primary / secondary pick traffic aware

The primary channel is the channel that will be untouched by BT. The
secondary channel might be touched by BT. Hence, we want the primary
to be the most active channel. To do so, use the TCM infrastructure.

Since the BT keeps sending notifications, we can rely on them to
trigger the check. Every 10 seconds, we will check what is the most
active context and chose the right primary.

We need to wait 10 seconds before we modify the settings because
frequent changes in these settings can confuse BT.

Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: mvm: use TCM data to decide scan priority

The code for changing the scan priority is already implemented, but
isn't yet in use. Now that TCM data is available, we can base the
scan priority decision on the traffic load.

Signed-off-by: Luca Coelho <[email protected]>

iwlwifi: mvm: add traffic condition monitoring (TCM)

Traffic condition monitor gathers data about the traffic load and
other conditions and can be used to make decisions regarding latency,
throughput etc. This patch introduces the code and data structures to
collect this data for future use.

Signed-off-by: Luca Coelho <[email protected]>

Don't leak MNT_INTERNAL away from internal mounts

We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
their copies. As it is, creating a deep stack of bindings of /proc/*/ns/*
somewhere in a new namespace and exiting yields a stack overflow.

Cc: [email protected]
Reported-by: Alexander Aring <[email protected]>
Bisected-by: Kirill Tkhai <[email protected]>
Tested-by: Kirill Tkhai <[email protected]>
Tested-by: Alexander Aring <[email protected]>
Signed-off-by: Al Viro <[email protected]>

net/smc: fix shutdown in state SMC_LISTEN

Calling shutdown with SHUT_RD and SHUT_RDWR for a listening SMC socket
crashes, because
commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker")
releases the internal clcsock in smc_close_active() and sets smc->clcsock
to NULL.
For SHUT_RD the smc_close_active() call is removed.
For SHUT_RDWR the kernel_sock_shutdown() call is omitted, since the
clcsock is already released.

Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker")
Signed-off-by: Ursula Braun <[email protected]>
Reported-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Fix memory fault in bnxt_ethtool_init()

In some firmware images, the length of BNX_DIR_TYPE_PKG_LOG nvram type
could be greater than the fixed buffer length of 4096 bytes allocated by
the driver. This was causing HWRM_NVM_READ to copy more data to the buffer
than the allocated size, causing general protection fault.

Fix the issue by allocating the exact buffer length returned by
HWRM_NVM_FIND_DIR_ENTRY, instead of 4096. Move the kzalloc() call
into the bnxt_get_pkgver() function.

Fixes: 3ebf6f0a09a2 ("bnxt_en: Add installed-package firmware version reporting via Ethtool GDRVINFO")
Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'virtio-ctrl-buffer-fixes'

Michael S. Tsirkin says:

====================
virtio: ctrl buffer fixes

Here are a couple of fixes related to the virtio control buffer.
Lightly tested on x86 only.
====================

Signed-off-by: David S. Miller <[email protected]>

virtio_net: sparse annotation fix

offloads is a buffer in virtio format, should use
the __virtio64 tag.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

virtio_net: fix adding vids on big-endian

Programming vids (adding or removing them) still passes
guest-endian values in the DMA buffer. That's wrong
if guest is big-endian and when virtio 1 is enabled.

Note: this is on top of a previous patch:
virtio_net: split out ctrl buffer

Fixes: 9465a7a6f ("virtio_net: enable v1.0 support")
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

virtio_net: split out ctrl buffer

When sending control commands, virtio net sets up several buffers for
DMA. The buffers are all part of the net device which means it's
actually allocated by kvmalloc so it's in theory (on extreme memory
pressure) possible to get a vmalloc'ed buffer which on some platforms
means we can't DMA there.

Fix up by moving the DMA buffers into a separate structure.

Reported-by: Mikulas Patocka <[email protected]>
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Avoid action name truncation

When longer interface names are used, the action names exposed in
/proc/interrupts and /proc/irq/* maybe truncated. For example, when
using the predictable name algorithm in systemd on a HiSilicon D05,
I see:

  ubuntu@d05-3:~$  grep enahisic2i0-tx /proc/interrupts | sed 's/.* //'
  enahisic2i0-tx0
  enahisic2i0-tx1
  [...]
  enahisic2i0-tx8
  enahisic2i0-tx9
  enahisic2i0-tx1
  enahisic2i0-tx1
  enahisic2i0-tx1
  enahisic2i0-tx1
  enahisic2i0-tx1
  enahisic2i0-tx1

Increase the max ring name length to allow for an interface name
of IFNAMSIZE. After this change, I now see:

  $ grep enahisic2i0-tx /proc/interrupts | sed 's/.* //'
  enahisic2i0-tx0
  enahisic2i0-tx1
  enahisic2i0-tx2
  [...]
  enahisic2i0-tx8
  enahisic2i0-tx9
  enahisic2i0-tx10
  enahisic2i0-tx11
  enahisic2i0-tx12
  enahisic2i0-tx13
  enahisic2i0-tx14
  enahisic2i0-tx15

Signed-off-by: dann frazier <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'Amiga-xsurf100'

Michael Schmitz says:

====================
New network driver for Amiga X-Surf 100 (m68k)

[This is a resend of my v3 series which was based on the wrong version and
tree. Only substantial change is to Asix AX99796B PHY driver.]

This patch series adds support for the Individual Computers X-Surf 100
network card for m68k Amiga, a network adapter based on the AX88796 chip set.

The driver was originally written for kernel version 3.19 by Michael Karcher
(see CC:), and adapted to 4.16+ for submission to netdev by me. Questions
regarding motivation for some of the changes are probably best directed at
Michael Karcher.

The driver has been tested by Adrian <[email protected]> who will
send his Tested-by tag separately.

A few changes to the ax88796 driver were required:
- to read the MAC address, some setup of the ax99796 chip must be done,
- attach to the MII bus only on device open to allow module unloading,
- allow to supersede ax_block_input/ax_block_output by card-specific
  optimized code,
- use an optional interrupt status callback to allow easier sharing of the
  card interrupt,
- set IRQF_SHARED if platform IRQ resource is marked shareable

The Asix Electronix PHY used on the X-Surf 100 is buggy, and causes the
software reset to hang if the previous command sent to the PHY was also
a soft reset. This bug requires addition of a PHY driver for Asix PHYs
to provide a fixed .soft_reset function, included in this series.

Some additional cleanup:
- do not attempt to free IRQ in ax_remove (complements 82533ad9a1c),
- clear platform drvdata on probe fail and module remove.

Changes since v1:

Raised in review by Andrew Lunn:
- move MII code around to avoid need for forward declaration,
- combine patches 2 and 7 to add cleanup in error path

Changes since v2:

- corrected authorship attribution to Michael Karcher

Suggested by Geert Uytterhoeven:
- use ei_local->reset_8390() instead of duplicating ax_reset_8390(),
- use %pR to format struct resource pointers,
- assign pdev and xs100 pointers in declaration,
- don't split error messages,
- change Kconfig logic to only require XSURF100 set on Amiga

Suggested by Andrew Lunn:
- add COMPILE_TEST to ax88796 Kconfig options,
- use new Asix PHY driver for X-Surf 100

Suggested by Andrew Lunn/Finn Thain:
- declare struct sk_buff in ax88796.h,
- correct whitespace error in ax88796.h

Changes since v3:

- various checkpatch cleanup

Andrew Lunn:
- don't duplicate genphy_soft_reset in Asix PHY driver, just call
  genphy_soft_reset after writing zero to control register
====================

Signed-off-by: David S. Miller <[email protected]>

net-next: New ax88796 platform driver for Amiga X-Surf 100 Zorro board (m68k)

Add platform device driver to populate the ax88796 platform data from
information provided by the XSurf100 zorro device driver. The ax88796
module will be loaded through this module's probe function.

Signed-off-by: Michael Karcher <[email protected]>
Signed-off-by: Michael Schmitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-next: ax88796: release platform device drvdata on probe error and module remove

The net device struct pointer is stored as platform device drvdata on
module probe - clear the drvdata entry on probe fail there, as well as
when unloading the module.

Signed-off-by: Michael Schmitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-next: ax88796: set IRQF_SHARED flag when IRQ resource is marked as shareable

On the Amiga X-Surf100, the network card interrupt is shared with many
other interrupt sources, so requires the IRQF_SHARED flag to register.

Signed-off-by: Michael Karcher <[email protected]>
Signed-off-by: Michael Schmitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-next: ax88796: add interrupt status callback to platform data

To be able to tell the ax88796 driver whether it is sensible to enter
the 8390 interrupt handler, an "is this interrupt caused by the 88796"
callback has been added to the ax_plat_data structure (with NULL being
compatible to the previous behaviour).

Signed-off-by: Michael Karcher <[email protected]>
Signed-off-by: Michael Schmitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-next: ax88796: Add block_input/output hooks to ax_plat_data

Add platform specific hooks for block transfer reads/writes of packet
buffer data, superseding the default provided ax_block_input/output.
Currently used for m68k Amiga XSurf100.

Signed-off-by: Michael Karcher <[email protected]>
Signed-off-by: Michael Schmitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-next: ax88796: Do not free IRQ in ax_remove() (already freed in ax_close()).

This complements the fix in 82533ad9a1c ("net: ethernet: ax88796:
don't call free_irq without request_irq first") that removed the
free_irq call in the error path of probe, to also not call free_irq
when remove is called to revert the effects of probe.

Fixes: 82533ad9a1c (net: ethernet: ax88796: don't call free_irq without request_irq first)
Signed-off-by: Michael Karcher <[email protected]>
Signed-off-by: Michael Schmitz <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-next: ax88796: Attach MII bus only when open

Call ax_mii_init in ax_open(), and unregister/remove mdiobus resources
in ax_close().

This is needed to be able to unload the module, as the module is busy
while the MII bus is attached.

Signed-off-by: Michael Karcher <[email protected]>
Signed-off-by: Michael Schmitz <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-next: ax88796: Fix MAC address reading

To read the MAC address from the (virtual) SAprom, the remote DMA
unit needs to be set up like for every other process access to card-local
memory.

Signed-off-by: Michael Karcher <[email protected]>
Signed-off-by: Michael Schmitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-next: phy: new Asix Electronics PHY driver

The Asix Electronics PHY found on the X-Surf 100 Amiga Zorro network
card by Individual Computers is buggy, and needs the reset bit toggled
as workaround to make a PHY soft reset succeed.

Add workaround driver just for this special case.

Suggested in xsurf100 patch series review by Andrew Lunn <[email protected]>

Signed-off-by: Michael Schmitz <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'Modernize-mdio-gpio'

Andrew Lunn says:

====================
Modernize mdio-gpio

This patchset is inspired by a previous version by Linus Walleij

It reworks the mdio-gpio code to make use of gpio descriptors instead
of gpio numbers. However compared to the previous version, it retains
support for platform devices. It does however remove the platform_data
header file. The needed GPIOs are now passed by making use of a gpiod
lookup table. e.g:

static struct gpiod_lookup_table zii_scu_mdio_gpiod_table = {
.dev_id = "mdio-gpio.0",
.table = {
GPIO_LOOKUP_IDX("gpio_ich", 17, NULL, MDIO_GPIO_MDC,
GPIO_ACTIVE_HIGH),
GPIO_LOOKUP_IDX("gpio_ich", 2, NULL, MDIO_GPIO_MDIO,
GPIO_ACTIVE_HIGH),
GPIO_LOOKUP_IDX("gpio_ich", 21, NULL, MDIO_GPIO_MDO,
GPIO_ACTIVE_LOW),
},
};
====================

Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-gpio: Remove redundant platform data header

The platform data header file is now unused. Remove it, but add
an extra include which it brought in.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-gpio: Add #defines for the GPIO index's

The GPIOs are described in device tree using a list, without names.
Add defines to indicate what each index in the list means. These
defines should also be used by platform devices passing GPIOs via a
GPIO lookup table.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-gpio: Parse properties directly into bitbang structure

The same parsing code can be used for both OF and platform devices, if
the platform device uses a gpiod_lookup_table. Parse these properties
directly into the bitbang structure, rather than use an intermediate
platform data structure.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-gpio: Move allocation for bitbanging data

Moving the allocation of this structure to the probe function is a
step towards making it the core data structure of the driver.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-gpio: Swap to using gpio descriptors

This simplifies the code, removing the need to handle active low
flags, etc.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-gpio: Remove support for IRQs in platform data

No current devices use IRQs in platform data, so remove support for
it. The MDIO core will also initialise the new bus such that all
addresses are polled, so remove the unneeded re-initialisation.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-gpio: remove support for phy mask

This is not needed any more by devices using platform data, so remove
it.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-gpio: remove support for ignoring turn around

This is not needed any more by devices using platform data, so remove
it.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-bitbang: Remove reset support

The mdio-gpio driver was the only user of the interface reset option.
Since it no longer uses it, remove it from the bit banging code.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-gpio: Remove reset function

The platform data can contain a function to call to reset
the bit banging interface. It is not used, so remove it.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy_ mdio-gpio: Fixup , which should be ;

Seems like an old typ0.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'bpf-type-format'

Martin KaFai Lau says:

====================
This patch introduces BPF Type Format (BTF).

BTF (BPF Type Format) is the meta data format which describes
the data types of BPF program/map.  Hence, it basically focus
on the C programming language which the modern BPF is primary
using.  The first use case is to provide a generic pretty print
capability for a BPF map.

A modified pahole that can convert dwarf to BTF is here:

  https://github.com/iamkafai/pahole/tree/btf

Please see individual patch for details.

v5:
- Remove BTF_KIND_FLOAT and BTF_KIND_FUNC which are not
  currently used.  They can be added in the future.
  Some bpf_df_xxx() are removed together.
- Add comment in patch 7 to clarify that the new bpffs_map_fops
  should not be extended further.

v4:
- Fix warning (remove unneeded semicolon)
- Remove a redundant variable (nr_bytes) from btf_int_check_meta() in
  patch 1.  Caught by W=1.

v3:
- Rebase to bpf-next
- Fix sparse warning (by adding static)
- Add BTF header logging: btf_verifier_log_hdr()
- Fix the alignment test on btf->type_off
- Add tests for the BTF header
- Lower the max BTF size to 16MB.  It should be enough
  for some time.  We could raise it later if it would
  be needed.

v2:
- Use kvfree where needed in patch 1 and 2
- Also consider BTF_INT_OFFSET() in the btf_int_check_meta()
  in patch 1
- Fix an incorrect goto target in map_create() during
  the btf-error-path in patch 7
- re-org some local vars to keep the rev xmas tree in btf.c
====================

Signed-off-by: Daniel Borkmann <[email protected]>

bpf: btf: Add BTF tests

This patch tests the BTF loading, map_create with BTF
and the changes in libbpf.

-r: Raw tests that test raw crafted BTF data
-f: Test LLVM compiled bpf prog with BTF data
-g: Test BPF_OBJ_GET_INFO_BY_FD for btf_fd
-p: Test pretty print

The tools/testing/selftests/bpf/Makefile will probe
for BTF support in llc and pahole before generating
debug info (-g) and convert them to BTF.  You can supply
the BTF supported binary through the following make variables:
LLC, BTF_PAHOLE and LLVM_OBJCOPY.

LLC: The lastest llc with -mattr=dwarfris support for the bpf target.
     It is only in the master of the llvm repo for now.
BTF_PAHOLE: The modified pahole with BTF support:
    https://github.com/iamkafai/pahole/tree/btf
    To add a BTF section: "pahole -J bpf_prog.o"
LLVM_OBJCOPY: Any llvm-objcopy should do

Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: btf: Add BTF support to libbpf

If the ".BTF" elf section exists, libbpf will try to create
a btf_fd (through BPF_BTF_LOAD). If that fails, it will still
continue loading the bpf prog/map without the BTF.

If the bpf_object has a BTF loaded, it will create a map with the btf_fd.
libbpf will try to figure out the btf_key_id and btf_value_id of a map by
finding the BTF type with name "<map_name>_key" and "<map_name>_value".
If they cannot be found, it will continue without using the BTF.

Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: btf: Sync bpf.h and btf.h to tools/

This patch sync up the bpf.h and btf.h to tools/

Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: btf: Add pretty print support to the basic arraymap

This patch adds pretty print support to the basic arraymap.
Support for other bpf maps can be added later.

This patch adds new attrs to the BPF_MAP_CREATE command to allow
specifying the btf_fd, btf_key_id and btf_value_id.  The
BPF_MAP_CREATE can then associate the btf to the map if
the creating map supports BTF.

A BTF supported map needs to implement two new map ops,
map_seq_show_elem() and map_check_btf().  This patch has
implemented these new map ops for the basic arraymap.

It also adds file_operations, bpffs_map_fops, to the pinned
map such that the pinned map can be opened and read.
After that, the user has an intuitive way to do
"cat bpffs/pathto/a-pinned-map" instead of getting
an error.

bpffs_map_fops should not be extended further to support
other operations.  Other operations (e.g. write/key-lookup...)
should be realized by the userspace tools (e.g. bpftool) through
the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc.
Follow up patches will allow the userspace to obtain
the BTF from a map-fd.

Here is a sample output when reading a pinned arraymap
with the following map's value:

struct map_value {
int count_a;
int count_b;
};

cat /sys/fs/bpf/pinned_array_map:

0: {1,2}
1: {3,4}
2: {5,6}
...

Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd

This patch adds BPF_OBJ_GET_INFO_BY_FD support to BTF fd.
The original BTF data, which was used to create the BTF fd during
the earlier BPF_BTF_LOAD call, will be returned.

The userspace is expected to allocate buffer
to info.info and the buffer size is set to info.info_len before
calling BPF_OBJ_GET_INFO_BY_FD.

The original BTF data is copied to the userspace buffer (info.info).
Only upto the user's specified info.info_len will be copied.

The original BTF data size is set to info.info_len. The userspace
needs to check if it is bigger than its allocated buffer size.
If it is, the userspace should realloc with the kernel-returned
info.info_len and call the BPF_OBJ_GET_INFO_BY_FD again.

Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: btf: Add BPF_BTF_LOAD command

This patch adds a BPF_BTF_LOAD command which
1) loads and verifies the BTF (implemented in earlier patches)
2) returns a BTF fd to userspace. In the next patch, the
BTF fd can be specified during BPF_MAP_CREATE.

It currently limits to CAP_SYS_ADMIN.

Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: btf: Add pretty print capability for data with BTF type info

This patch adds pretty print capability for data with BTF type info.
The current usage is to allow pretty print for a BPF map.

The next few patches will allow a read() on a pinned map with BTF
type info for its key and value.

This patch uses the seq_printf() infra.

Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: btf: Check members of struct/union

This patch checks a few things of struct's members:

1) It has a valid size (e.g. a "const void" is invalid)
2) A member's size (+ its member's offset) does not exceed
the containing struct's size.
3) The member's offset satisfies the alignment requirement

The above can only be done after the needs_resolve member's type
is resolved. Hence, the above is done together in
btf_struct_resolve().

Each possible member's type (e.g. int, enum, modifier...) implements
the check_member() ops which will be called from btf_struct_resolve().

Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: btf: Validate type reference

After collecting all btf_type in the first pass in an earlier patch,
the second pass (in this patch) can validate the reference types
(e.g. the referring type does exist and it does not refer to itself).

While checking the reference type, it also gathers other information (e.g.
the size of an array). This info will be useful in checking the
struct's members in a later patch. They will also be useful in doing
pretty print later.

Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: btf: Introduce BPF Type Format (BTF)

This patch introduces BPF type Format (BTF).

BTF (BPF Type Format) is the meta data format which describes
the data types of BPF program/map.  Hence, it basically focus
on the C programming language which the modern BPF is primary
using.  The first use case is to provide a generic pretty print
capability for a BPF map.

BTF has its root from CTF (Compact C-Type format).  To simplify
the handling of BTF data, BTF removes the differences between
small and big type/struct-member.  Hence, BTF consistently uses u32
instead of supporting both "one u16" and "two u32 (+padding)" in
describing type and struct-member.

It also raises the number of types (and functions) limit
from 0x7fff to 0x7fffffff.

Due to the above changes,  the format is not compatible to CTF.
Hence, BTF starts with a new BTF_MAGIC and version number.

This patch does the first verification pass to the BTF.  The first
pass checks:
1. meta-data size (e.g. It does not go beyond the total btf's size)
2. name_offset is valid
3. Each BTF_KIND (e.g. int, enum, struct....) does its
   own check of its meta-data.

Some other checks, like checking a struct's member is referring
to a valid type, can only be done in the second pass.  The second
verification pass will be implemented in the next patch.

Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

Merge branch 'ipv6-followup-to-fib6_info-change'

David Ahern says:

====================
net/ipv6: followup to fib6_info change

Followup to fib change for IPv6.

First 2 patches rename fib6_info struct elements to match its name,
and rename addrconf_dst_alloc to match what it returns.

Patches 3-7 refactor the code to remove the need for fib6_idev reducing
fib6_info by another 8 bytes to 200 bytes.

Patch 8 fixes the gfp flags argument to addrconf_prefix_route in a
couple of places.
====================

Signed-off-by: David S. Miller <[email protected]>

net/ipv6: Fix gfp_flags arg to addrconf_prefix_route

Eric noticed that __ipv6_ifa_notify is called under rcu_read_lock, so
the gfp argument to addrconf_prefix_route can not be GFP_KERNEL.

While scrubbing other calls I noticed addrconf_addr_gen has one
place with GFP_ATOMIC that can be GFP_KERNEL.

Fixes: acb54e3cba404 ("net/ipv6: Add gfp_flags to route add functions")
Reported-by: [email protected]
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv6: Remove fib6_idev

fib6_idev can be obtained from __in6_dev_get on the nexthop device
rather than caching it in the fib6_info. Remove it.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv6: Remove compare of fib6_idev from rt6_duplicate_nexthop

After 4832c30d5458 ("net: ipv6: put host and anycast routes on device
with address") the comparison of idev does not add value since it
correlates to the nexthop device which is already compared. Remove
the idev comparison.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv6: Change ip6_route_get_saddr to get dev from route

Prior to 4832c30d5458 ("net: ipv6: put host and anycast routes on device
with address") host routes and anycast routes were installed with the
device set to loopback (or VRF device once that feature was added). In the
older code dst.dev was set to loopback (needed for packet tx) and rt6i_idev
was used to denote the actual interface.

Commit 4832c30d5458 changed the code to have dst.dev pointing to the real
device with the switch to lo or vrf device done on dst clones. As a
consequence of this change ip6_route_get_saddr can just pass the nexthop
device to ipv6_dev_get_saddr.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv6: Remove unnecessary checks on fib6_idev

Prior to 4832c30d5458 ("net: ipv6: put host and anycast routes on device
with address") host routes and anycast routes were installed with the
device set to loopback (or VRF device once that feature was added). In the
older code dst.dev was set to loopback (needed for packet tx) and rt6i_idev
was used to denote the actual interface.

Commit 4832c30d5458 changed the code to have dst.dev pointing to the real
device with the switch to lo or vrf device done on dst clones. As a
consequence of this change a couple of device checks during route lookups
are no longer needed. Remove them.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv6: Remove aca_idev

aca_idev has only 1 user - inet6_fill_ifacaddr - and it only
wants the device index which can be extracted from the fib6_info
nexthop.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv6: Rename addrconf_dst_alloc

addrconf_dst_alloc now returns a fib6_info. Update the name
and its users to reflect the change.

Rename only; no functional change intended.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv6: Rename fib6_info struct elements

Change the prefix for fib6_info struct elements from rt6i_ to fib6_.
rt6i_pcpu and rt6i_exception_bucket are left as is given that they
point to rt6_info entries.

Rename only; not functional change intended.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

docs: ip-sysctl.txt: fix name of some ipv6 variables

The name of the following proc/sysctl entries were incorrectly
documented:

    /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_number
    /proc/sys/net/ipv6/conf/<interface>/max_hbt_opts_number
    /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_length
    /proc/sys/net/ipv6/conf/<interface>/max_hbt_length

Their name was set to the name of the symbol in the .data field of the
control table instead of their .proc name.

Signed-off-by: Olivier Gayot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vmxnet3: fix incorrect dereference when rxvlan is disabled

vmxnet3_get_hdr_len() is used to calculate the header length which in
turn is used to calculate the gso_size for skb. When rxvlan offload is
disabled, vlan tag is present in the header and the function references
ip header from sizeof(ethhdr) and leads to incorrect pointer reference.

This patch fixes this issue by taking sizeof(vlan_ethhdr) into account
if vlan tag is present and correctly references the ip hdr.

Signed-off-by: Ronak Doshi <[email protected]>
Acked-by: Guolin Yang <[email protected]>
Acked-by: Louis Luo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

llc: hold llc_sap before release_sock()

syzbot reported we still access llc->sap in llc_backlog_rcv()
after it is freed in llc_sap_remove_socket():

Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785
llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline]
llc_conn_service net/llc/llc_conn.c:400 [inline]
llc_conn_state_process+0x4e1/0x13a0 net/llc/llc_conn.c:75
llc_backlog_rcv+0x195/0x1e0 net/llc/llc_conn.c:891
sk_backlog_rcv include/net/sock.h:909 [inline]
__release_sock+0x12f/0x3a0 net/core/sock.c:2335
release_sock+0xa4/0x2b0 net/core/sock.c:2850
llc_ui_release+0xc8/0x220 net/llc/af_llc.c:204

llc->sap is refcount'ed and llc_sap_remove_socket() is paired
with llc_sap_add_socket(). This can be amended by holding its refcount
before llc_sap_remove_socket() and releasing it after release_sock().

Reported-by: <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends

After working on IP defragmentation lately, I found that some large
packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
zero paddings on the last (small) fragment.

While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
to CHECKSUM_NONE, forcing a full csum validation, even if all prior
fragments had CHECKSUM_COMPLETE set.

We can instead compute the checksum of the part we are trimming,
usually smaller than the part we keep.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

MAINTAINERS: Direct networking documentation changes to netdev

Networking docs changes go through the networking tree, so patch the
MAINTAINERS file to direct authors to the right place.

Signed-off-by: Jonathan Corbet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit"

Trivial fix to spelling mistake in message text.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: qmi_wwan: add Wistron Neweb D19Q1

This modem is embedded on dlink dwr-960 router.
The oem configuration states:

T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0
D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1435 ProdID=d191 Rev=ff.ff
S: Manufacturer=Android
S: Product=Android
S: SerialNumber=0123456789ABCDEF
C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none)
E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us

Tested on openwrt distribution

Signed-off-by: Pawel Dembicki <[email protected]>
Acked-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN"

Trivial fix to spelling mistake

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-next/hinic: add arm64 support

This patch enables arm64 platform support for the HINIC driver.

Signed-off-by: Zhao Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: Disable ACS Feature for GMAC >= 4

ACS Feature is currently enabled for GMAC >= 4 but the llc_snap status
is never checked in descriptor rx_status callback. This will cause
stmmac to always strip packets even that ACS feature is already
stripping them.

Lets be safe and disable the ACS feature for GMAC >= 4 and always strip
the packets for this GMAC version.

Fixes: 477286b53f55 ("stmmac: add GMAC4 core support")
Signed-off-by: Jose Abreu <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Joao Pinto <[email protected]>
Cc: Giuseppe Cavallaro <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: Fix DMA address mask size

PPv2 TX/RX descriptors uses 40bits DMA addresses, but 41 bits masks were
used (GENMASK_ULL(40, 0)).

This commit fixes that by using the correct mask.

Fixes: e7c5359f2eed ("net: mvpp2: introduce PPv2.2 HW descriptors and adapt accessors")
Signed-off-by: Maxime Chevallier <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'TCP-data-delivery-and-ECN-stats-tracking'

Yuchung Cheng says:

====================
tracking TCP data delivery and ECN stats

This patch series improve tracking the data delivery status
  1. minor improvement on SYN data
  2. accounting bytes delivered with CE marks
  3. exporting the delivery stats to applications

s.t. users can get better sense of TCP performance at per host,
per connection, and even per application message level.
====================

Signed-off-by: David S. Miller <[email protected]>

tcp: export packets delivery info

Export data delivered and delivered with CE marks to
1) SNMP TCPDelivered and TCPDeliveredCE
2) getsockopt(TCP_INFO)
3) Timestamping API SOF_TIMESTAMPING_OPT_STATS

Note that for SCM_TSTAMP_ACK, the delivery info in
SOF_TIMESTAMPING_OPT_STATS is reported before the info
was fully updated on the ACK.

These stats help application monitor TCP delivery and ECN status
on per host, per connection, even per message level.

Signed-off-by: Yuchung Cheng <[email protected]>
Reviewed-by: Neal Cardwell <[email protected]>
Reviewed-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: track total bytes delivered with ECN CE marks

Introduce a new delivered_ce stat in tcp socket to estimate
number of packets being marked with CE bits. The estimation is
done via ACKs with ECE bit. Depending on the actual receiver
behavior, the estimation could have biases.

Since the TCP sender can't really see the CE bit in the data path,
so the sender is technically counting packets marked delivered with
the "ECE / ECN-Echo" flag set.

With RFC3168 ECN, because the ECE bit is sticky, this count can
drastically overestimate the nummber of CE-marked data packets

With DCTCP-style ECN this should be reasonably precise unless there
is loss in the ACK path, in which case it's not precise.

With AccECN proposal this can be made still more precise, even in
the case some degree of ACK loss.

However this is sender's best estimate of CE information.

Signed-off-by: Yuchung Cheng <[email protected]>
Reviewed-by: Neal Cardwell <[email protected]>
Reviewed-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: new helper to calculate newly delivered

Add new helper tcp_newly_delivered() to prepare the ECN accounting change.

Signed-off-by: Yuchung Cheng <[email protected]>
Reviewed-by: Neal Cardwell <[email protected]>
Reviewed-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: better delivery accounting for SYN-ACK and SYN-data

the tcp_sock:delivered has inconsistent accounting for SYN and FIN.
1. it counts pure FIN
2. it counts pure SYN
3. it counts SYN-data twice
4. it does not count SYN-ACK

For congestion control perspective it does not matter much as C.C. only
cares about the difference not the aboslute value. But the next patch
would export this field to user-space so it's better to report the absolute
value w/o these caveats.

This patch counts SYN, SYN-ACK, or SYN-data delivery once always in
the "delivered" field.

Signed-off-by: Yuchung Cheng <[email protected]>
Reviewed-by: Neal Cardwell <[email protected]>
Reviewed-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: change the comment of dev_mc_init

The comment of dev_mc_init() is wrong. which use dev_mc_flush
instead of dev_mc_init.

Signed-off-by: Lianwen Sun <[email protected]
Signed-off-by: David S. Miller <[email protected]>

bpf: reserve xdp_frame size in xdp headroom

Commit 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on
page reuse") tried to allow user/bpf_prog to (re)use area used by
xdp_frame (stored in frame headroom), by memset clearing area when
bpf_xdp_adjust_head give bpf_prog access to headroom area.

The mentioned commit had two bugs. (1) Didn't take bpf_xdp_adjust_meta
into account. (2) a combination of bpf_xdp_adjust_head calls, where
xdp->data is moved into xdp_frame section, can cause clearing
xdp_frame area again for area previously granted to bpf_prog.

After discussions with Daniel, we choose to implement a simpler
solution to the problem, which is to reserve the headroom used by
xdp_frame info.

This also avoids the situation where bpf_prog is allowed to adjust/add
headers, and then XDP_REDIRECT later drops the packet due to lack of
headroom for the xdp_frame. This would likely confuse the end-user.

Fixes: 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse")
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

mmc: renesas_sdhi_internal_dmac: limit DMA RX for old SoCs

Early revisions of certain SoCs cannot do multiple DMA RX streams in
parallel. To avoid data corruption, only allow one DMA RX channel and
fall back to PIO, if needed.

Signed-off-by: Wolfram Sang <[email protected]>
Reviewed-by: Yoshihiro Shimoda <[email protected]>
Tested-by: Nguyen Viet Dung <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Cc: [email protected]
Signed-off-by: Ulf Hansson <[email protected]>

HID: i2c-hid: fix inverted return value from i2c_hid_command()

i2c_hid_command() returns non-zero in error cases (the actual
errno). Error handling in for I2C_HID_QUIRK_RESEND_REPORT_DESCR
case in i2c_hid_resume() had the check inverted; fix that.

Fixes: 3e83eda467 ("HID: i2c-hid: Fix resume issue on Raydium touchscreen device")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>

powerpc/kvm: Fix lockups when running KVM guests on Power8

When running KVM guests on Power8 we can see a lockup where one CPU
stops responding. This often leads to a message such as:

  watchdog: CPU 136 detected hard LOCKUP on other CPUs 72
  Task dump for CPU 72:
  qemu-system-ppc R  running task    10560 20917  20908 0x00040004

And then backtraces on other CPUs, such as:

  Task dump for CPU 48:
  ksmd            R  running task    10032  1519      2 0x00000804
  Call Trace:
    ...
    --- interrupt: 901 at smp_call_function_many+0x3c8/0x460
        LR = smp_call_function_many+0x37c/0x460
    pmdp_invalidate+0x100/0x1b0
    __split_huge_pmd+0x52c/0xdb0
    try_to_unmap_one+0x764/0x8b0
    rmap_walk_anon+0x15c/0x370
    try_to_unmap+0xb4/0x170
    split_huge_page_to_list+0x148/0xa30
    try_to_merge_one_page+0xc8/0x990
    try_to_merge_with_ksm_page+0x74/0xf0
    ksm_scan_thread+0x10ec/0x1ac0
    kthread+0x160/0x1a0
    ret_from_kernel_thread+0x5c/0x78

This is caused by commit 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync
for KVM state when waking from idle"), which added a check in
pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is
already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and
test of kvm_hstate.hwthread_req.

The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when
entering the guest, so it can then come out to cede with
KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after
setting hwthread_req to 1, but because hwthread_state is still
KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we
wake up from idle and won't go to kvm_start_guest. From there the
thread will return somewhere garbage and crash.

Fix it by skipping the store of hwthread_state, but not the test of
hwthread_req, when coming out of idle. It's OK to skip the sync in
that case because hwthread_req will have been set on the same thread,
so there is no synchronisation required.

Fixes: 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle")
Signed-off-by: Michael Ellerman <[email protected]>

ipv6: frags: fix a lockdep false positive

lockdep does not know that the locks used by IPv4 defrag
and IPv6 reassembly units are of different classes.

It complains because of following chains :

1) sch_direct_xmit()        (lock txq->_xmit_lock)
    dev_hard_start_xmit()
     xmit_one()
      dev_queue_xmit_nit()
       packet_rcv_fanout()
        ip_check_defrag()
         ip_defrag()
          spin_lock()     (lock frag queue spinlock)

2) ip6_input_finish()
    ipv6_frag_rcv()       (lock frag queue spinlock)
     ip6_frag_queue()
      icmpv6_param_prob() (lock txq->_xmit_lock at some point)

We could add lockdep annotations, but we also can make sure IPv6
calls icmpv6_param_prob() only after the release of the frag queue spinlock,
since this naturally makes frag queue spinlock a leaf in lock hierarchy.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

powerpc/eeh: Fix enabling bridge MMIO windows

On boot we save the configuration space of PCIe bridges. We do this so
when we get an EEH event and everything gets reset that we can restore
them.

Unfortunately we save this state before we've enabled the MMIO space
on the bridges. Hence if we have to reset the bridge when we come back
MMIO is not enabled and we end up taking an PE freeze when the driver
starts accessing again.

This patch forces the memory/MMIO and bus mastering on when restoring
bridges on EEH. Ideally we'd do this correctly by saving the
configuration space writes later, but that will have to come later in
a larger EEH rewrite. For now we have this simple fix.

The original bug can be triggered on a boston machine by doing:
echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
On boston, this PHB has a PCIe switch on it. Without this patch,
you'll see two EEH events, 1 expected and 1 the failure we are fixing
here. The second EEH event causes the anything under the PHB to
disappear (i.e. the i40e eth).

With this patch, only 1 EEH event occurs and devices properly recover.

Fixes: 652defed4875 ("powerpc/eeh: Check PCIe link after reset")
Cc: [email protected] # v3.11+
Reported-by: Pridhiviraj Paidipeddi <[email protected]>
Signed-off-by: Michael Neuling <[email protected]>
Acked-by: Russell Currey <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

net: qualcomm: rmnet: Fix warning seen with fill_info

When the last rmnet device attached to a real device is removed, the
real device is unregistered from rmnet. As a result, the real device
lookup fails resulting in a warning when the fill_info handler is
called as part of the rmnet device unregistration.

Fix this by returning the rmnet flags as 0 when no real device is
present.

WARNING: CPU: 0 PID: 1779 at net/core/rtnetlink.c:3254
rtmsg_ifinfo_build_skb+0xca/0x10d
Modules linked in:
CPU: 0 PID: 1779 Comm: ip Not tainted 4.16.0-11872-g7ce2367 #1
Stack:
7fe655f0 60371ea3 00000000 00000000
60282bc6 6006b116 7fe65600 60371ee8
7fe65660 6003a68c 00000000 900000000
Call Trace:
[<6006b116>] ? printk+0x0/0x94
[<6001f375>] show_stack+0xfe/0x158
[<60371ea3>] ? dump_stack_print_info+0xe8/0xf1
[<60282bc6>] ? rtmsg_ifinfo_build_skb+0xca/0x10d
[<6006b116>] ? printk+0x0/0x94
[<60371ee8>] dump_stack+0x2a/0x2c
[<6003a68c>] __warn+0x10e/0x13e
[<6003a82c>] warn_slowpath_null+0x48/0x4f
[<60282bc6>] rtmsg_ifinfo_build_skb+0xca/0x10d
[<60282c4d>] rtmsg_ifinfo_event.part.37+0x1e/0x43
[<60282c2f>] ? rtmsg_ifinfo_event.part.37+0x0/0x43
[<60282d03>] rtmsg_ifinfo+0x24/0x28
[<60264e86>] dev_close_many+0xba/0x119
[<60282cdf>] ? rtmsg_ifinfo+0x0/0x28
[<6027c225>] ? rtnl_is_locked+0x0/0x1c
[<6026ca67>] rollback_registered_many+0x1ae/0x4ae
[<600314be>] ? unblock_signals+0x0/0xae
[<6026cdc0>] ? unregister_netdevice_queue+0x19/0xec
[<6026ceec>] unregister_netdevice_many+0x21/0xa1
[<6027c765>] rtnl_delete_link+0x3e/0x4e
[<60280ecb>] rtnl_dellink+0x262/0x29c
[<6027c241>] ? rtnl_get_link+0x0/0x3e
[<6027f867>] rtnetlink_rcv_msg+0x235/0x274

Fixes: be81a85f5f87 ("net: qualcomm: rmnet: Implement fill_info")
Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

hv_netvsc: Add NetVSP v6 and v6.1 into version negotiation

This patch adds the NetVSP v6 and 6.1 message structures, and includes
these versions into NetVSC/NetVSP version negotiation process.

Signed-off-by: Haiyang Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

hv_netvsc: propogate Hyper-V friendly name into interface alias

This patch implement the 'Device Naming' feature of the Hyper-V
network device API. In Hyper-V on the host through the GUI or PowerShell
it is possible to enable the device naming feature which causes
the host to make available to the guest the name of the device.
This shows up in the RNDIS protocol as the friendly name.

The name has no particular meaning and is limited to 256 characters.
The value can only be set via PowerShell on the host, but could
be scripted for mass deployments. The default value is the
string 'Network Adapter' and since that is the same for all devices
and useless, the driver ignores it.

In Windows, the value goes into a registry key for use in SNMP
ifAlias. For Linux, this patch puts the value in the network
device alias property; where it is visible in ip tools and SNMP.

The host provided ifAlias is just a suggestion, and can be
overridden by later ip commands.

Also requires exporting dev_set_alias in netdev core.

Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'r8169-series-with-further-smaller-improvements'

Heiner Kallweit says:

====================
r8169: series with further smaller improvements

This series includes further smaller improvements.

Then I think the basic cleanup has been done and next step would be
preparing the switch to phylib.
====================

Signed-off-by: David S. Miller <[email protected]>

r8169: remove jumbo_tx_csum from chip config struct

According to the chip configuration entries only RTL8169 (ver <= 06)
supports tx checksumming for jumbo packets.
By the way: constant JUMBO_1K is a little misleading because it refers
to the standard packet size and not to a jumbo packet size.

By implementing this rule we can get rid of configuring tx checksumming
support per chip type.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8169: improve pci region handling

The region to be used is always the first of type IORESOURCE_MEM.
We can implement this rule directly w/o having to specify which
region is the first one per configuration entry.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8169: drop member txd_version from struct rtl8169_private

txd_version is used in rtl_init_one() only, so we can drop member
txd_version from struct rtl8169_private.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8169: improve rtl8169_get_mac_version

Certain entries in array mac_info[] are redundant, so remove them:
0x7cf, 0x2c200000 (VER 33): matched by entry 0x7c8, 0x2c000000
0x7cf, 0x28300000 (VER 26): matched by entry 0x7c8, 0x28000000
0x7cf, 0x3cb00000 (VER 24): matched by entry 0x7c8, 0x3c800000
0x7cf, 0x3c400000 (VER 22): matched by entry 0x7c8, 0x3c000000
0x7cf, 0x38500000 (VER 17): matched by entry 0x7c8, 0x38000000
0x7cf, 0x44900000 (VER 39): matched by entry 0x7c8, 0x44800000
0x7cf, 0x40b00000 (VER 30): matched by entry 0x7c8, 0x40800000
0x7cf, 0x40a00000 (VER 30): matched by entry 0x7c8, 0x40800000
0x7cf, 0x34a00000 (VER 09): matched by entry 0x7c8, 0x34800000
0x7cf, 0x24a00000 (VER 09): matched by entry 0x7c8, 0x24800000

In addition don't mask out bits 30 and 29 when printing the XID.
Most likely this is a relict from the times when the driver covered
RTL8169 chip version only.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8169: don't display tp->mmio_addr address

For security reasons since commit ad67b74d2469 "printk: hash addresses
printed with %p" %p doesn't display the full address any longer.
We could switch to %px, but I think the pointer address doesn't
provide a real benefit, so remove printing the hashed address.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8169: drop member opts1_mask from struct rtl8169_private

We can get rid of member opts1_mask and in addition save a few cpu
cycles in the hot path of rtl_rx().

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8169: change interrupt handler argument type

Code can be a little simplified by switching the interrupt handler
argument type to struct rtl8169_private *.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8169: change argument type of counters handling functions

The counter handling functions don't deal with the net_device, so code
can be simplified by changing the argument type to
struct rtl8169_private *.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8169: change hw_start argument type

Code can be simplified by changing the argument type of hw_start
callbacks from struct net_device * to struct rtl8169_private *.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8169: remove rtl8169_map_to_asic

This function is very simple and used only once, so we can inline
the two statements.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>