Git Repo - linux.git/log

net: phy: replace bool members in struct phy_device with bit-fields

In struct phy_device we have a number of flags being defined as type
bool. Similar to e.g. struct pci_dev we can save some space by using
bit-fields.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for-4.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
"A one-liner that prevents leaking an internal error value 1 out of the
  ftruncate syscall.

  This has been observed in practice. The steps to reproduce make a
  common pattern (open/write/fync/ftruncate) but also need the
  application to not check only for negative values and happens only for
  compressed inlined files.

  The conditions are narrow but as this could break userspace I think
  it's better to merge it now and not wait for the merge window"

* tag 'for-4.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix error handling in btrfs_truncate()

ALSA: hda - Fix runtime PM

Before commit 3b5b899ca67d ("ALSA: hda: Make use of core codec functions
to sync power state"), hda_set_power_state() returned the response to
the Get Power State verb, a 32-bit unsigned integer whose expected value
is 0x233 after transitioning a codec to D3, and 0x0 after transitioning
it to D0.

The response value is significant because hda_codec_runtime_suspend()
does not clear the codec's bit in the codec_powered bitmask unless the
AC_PWRST_CLK_STOP_OK bit (0x200) is set in the response value.  That in
turn prevents the HDA controller from runtime suspending because
azx_runtime_idle() checks that the codec_powered bitmask is zero.

Since commit 3b5b899ca67d, hda_set_power_state() only returns 0x0 or
0x1, thereby breaking runtime PM for any HDA controller.  That's because
an inline function introduced by the commit returns a bool instead of a
32-bit unsigned int.  The change was likely erroneous and resulted from
copying and pasting snd_hda_check_power_state(), which is immediately
preceding the newly introduced inline function.  Fix it.

Link: https://bugs.freedesktop.org/show_bug.cgi?id=106597
Fixes: 3b5b899ca67d ("ALSA: hda: Make use of core codec functions to sync power state")
Cc: Alex Deucher <[email protected]>
Cc: Abhijeet Kumar <[email protected]>
Reported-and-tested-by: Gunnar Krüger <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"

This reverts the following commits that change CMA design in MM.

3d2054ad8c2d ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y")

1d47a3ec09b5 ("mm/cma: remove ALLOC_CMA")

bad8c6c0b114 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")

Ville reported a following error on i386.

  Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
  microcode: microcode updated early to revision 0x4, date = 2013-06-28
  Initializing CPU#0
  Initializing HighMem for node 0 (000377fe:00118000)
  Initializing Movable for node 0 (00000001:00118000)
  BUG: Bad page state in process swapper  pfn:377fe
  page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0
  flags: 0x80000000()
  raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001
  page dumped because: nonzero mapcount
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145
  Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013
  Call Trace:
   dump_stack+0x60/0x96
   bad_page+0x9a/0x100
   free_pages_check_bad+0x3f/0x60
   free_pcppages_bulk+0x29d/0x5b0
   free_unref_page_commit+0x84/0xb0
   free_unref_page+0x3e/0x70
   __free_pages+0x1d/0x20
   free_highmem_page+0x19/0x40
   add_highpages_with_active_regions+0xab/0xeb
   set_highmem_pages_init+0x66/0x73
   mem_init+0x1b/0x1d7
   start_kernel+0x17a/0x363
   i386_start_kernel+0x95/0x99
   startup_32_smp+0x164/0x168

The reason for this error is that the span of MOVABLE_ZONE is extended
to whole node span for future CMA initialization, and, normal memory is
wrongly freed here.  I submitted the fix and it seems to work, but,
another problem happened.

It's so late time to fix the later problem so I decide to reverting the
series.

Reported-by: Ville Syrjälä <[email protected]>
Acked-by: Laura Abbott <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'for-4.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:
"Nothing too interesting.  Four patches to update the blacklist and
  add a controller ID"

* 'for-4.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ahci: Add PCI ID for Cannon Lake PCH-LP AHCI
  libata: blacklist Micron 500IT SSD with MU01 firmware
  libata: Apply NOLPM quirk for SAMSUNG PM830 CXM13D1Q.
  libata: Blacklist some Sandisk SSDs for NCQ

Merge tag 'for-linus-20180524' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Two fixes that should go into this release:

   - a loop writeback error clearing fix from Jeff

   - the sr sense fix from myself"

* tag 'for-linus-20180524' of git://git.kernel.dk/linux-block:
  loop: clear wb_err in bd_inode when detaching backing file
  sr: pass down correctly sized SCSI sense buffer

Merge tag 'pm-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"Fix a regression from the 4.15 cycle that caused the system suspend
  and resume overhead to increase on many systems and triggered more
  serious problems on some of them (Rafael Wysocki)"

* tag 'pm-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / core: Fix direct_complete handling for devices with no callbacks

bpf: properly enforce index mask to prevent out-of-bounds speculation

While reviewing the verifier code, I recently noticed that the
following two program variants in relation to tail calls can be
loaded.

Variant 1:

  # bpftool p d x i 15
    0: (15) if r1 == 0x0 goto pc+3
    1: (18) r2 = map[id:5]
    3: (05) goto pc+2
    4: (18) r2 = map[id:6]
    6: (b7) r3 = 7
    7: (35) if r3 >= 0xa0 goto pc+2
    8: (54) (u32) r3 &= (u32) 255
    9: (85) call bpf_tail_call#12
   10: (b7) r0 = 1
   11: (95) exit

  # bpftool m s i 5
    5: prog_array  flags 0x0
        key 4B  value 4B  max_entries 4  memlock 4096B
  # bpftool m s i 6
    6: prog_array  flags 0x0
        key 4B  value 4B  max_entries 160  memlock 4096B

Variant 2:

  # bpftool p d x i 20
    0: (15) if r1 == 0x0 goto pc+3
    1: (18) r2 = map[id:8]
    3: (05) goto pc+2
    4: (18) r2 = map[id:7]
    6: (b7) r3 = 7
    7: (35) if r3 >= 0x4 goto pc+2
    8: (54) (u32) r3 &= (u32) 3
    9: (85) call bpf_tail_call#12
   10: (b7) r0 = 1
   11: (95) exit

  # bpftool m s i 8
    8: prog_array  flags 0x0
        key 4B  value 4B  max_entries 160  memlock 4096B
  # bpftool m s i 7
    7: prog_array  flags 0x0
        key 4B  value 4B  max_entries 4  memlock 4096B

In both cases the index masking inserted by the verifier in order
to control out of bounds speculation from a CPU via b2157399cc98
("bpf: prevent out-of-bounds speculation") seems to be incorrect
in what it is enforcing. In the 1st variant, the mask is applied
from the map with the significantly larger number of entries where
we would allow to a certain degree out of bounds speculation for
the smaller map, and in the 2nd variant where the mask is applied
from the map with the smaller number of entries, we get buggy
behavior since we truncate the index of the larger map.

The original intent from commit b2157399cc98 is to reject such
occasions where two or more different tail call maps are used
in the same tail call helper invocation. However, the check on
the BPF_MAP_PTR_POISON is never hit since we never poisoned the
saved pointer in the first place! We do this explicitly for map
lookups but in case of tail calls we basically used the tail
call map in insn_aux_data that was processed in the most recent
path which the verifier walked. Thus any prior path that stored
a pointer in insn_aux_data at the helper location was always
overridden.

Fix it by moving the map pointer poison logic into a small helper
that covers both BPF helpers with the same logic. After that in
fixup_bpf_calls() the poison check is then hit for tail calls
and the program rejected. Latter only happens in unprivileged
case since this is the *only* occasion where a rewrite needs to
happen, and where such rewrite is specific to the map (max_entries,
index_mask). In the privileged case the rewrite is generic for
the insn->imm / insn->code update so multiple maps from different
paths can be handled just fine since all the remaining logic
happens in the instruction processing itself. This is similar
to the case of map lookups: in case there is a collision of
maps in fixup_bpf_calls() we must skip the inlined rewrite since
this will turn the generic instruction sequence into a non-
generic one. Thus the patch_call_imm will simply update the
insn->imm location where the bpf_map_lookup_elem() will later
take care of the dispatch. Given we need this 'poison' state
as a check, the information of whether a map is an unpriv_array
gets lost, so enforcing it prior to that needs an additional
state. In general this check is needed since there are some
complex and tail call intensive BPF programs out there where
LLVM tends to generate such code occasionally. We therefore
convert the map_ptr rather into map_state to store all this
w/o extra memory overhead, and the bit whether one of the maps
involved in the collision was from an unpriv_array thus needs
to be retained as well there.

Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge tag 'batadv-next-for-davem-20180524' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

- bump version strings, by Simon Wunderlich

- Disable batman-adv debugfs by default, by Sven Eckelmann

- Improve handling mesh nodes with multicast optimizations disabled,
by Linus Luessing

- Avoid bool in structs, by Sven Eckelmann

- Allocate less memory when debugfs is disabled, by Sven Eckelmann

- Fix batadv_interface_tx return data type, by Luc Van Oostenryck

- improve link speed handling for virtual interfaces, by Marek Lindner

- Enable BATMAN V algorithm by default, by Marek Lindner
====================

Signed-off-by: David S. Miller <[email protected]>

ahci: Add PCI ID for Cannon Lake PCH-LP AHCI

This one should be using the default LPM policy for mobile chipsets so
add the PCI ID to the driver list of supported revices.

Signed-off-by: Mika Westerberg <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Cc: [email protected]

bpfilter: don't pass O_CREAT when opening console for debug

Passing O_CREAT (00000100) to open means we should also pass file
mode as the third parameter. Creating /dev/console as a regular
file may not be helpful anyway, so simply drop the flag when
opening debug_fd.

Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bpfilter: fix build dependency

BPFILTER could have been enabled without INET causing this build error:
ERROR: "bpfilter_process_sockopt" [net/bpfilter/bpfilter.ko] undefined!

Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Reported-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

arm64: Make sure permission updates happen for pmd/pud

Commit 15122ee2c515 ("arm64: Enforce BBM for huge IO/VMAP mappings")
disallowed block mappings for ioremap since that code does not honor
break-before-make. The same APIs are also used for permission updating
though and the extra checks prevent the permission updates from happening,
even though this should be permitted. This results in read-only permissions
not being fully applied. Visibly, this can occasionaly be seen as a failure
on the built in rodata test when the test data ends up in a section or
as an odd RW gap on the page table dump. Fix this by using
pgattr_change_is_safe instead of p*d_present for determining if the
change is permitted.

Reviewed-by: Kees Cook <[email protected]>
Tested-by: Peter Robinson <[email protected]>
Reported-by: Peter Robinson <[email protected]>
Fixes: 15122ee2c515 ("arm64: Enforce BBM for huge IO/VMAP mappings")
Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

Merge branch 'bpf-ipv6-seg6-bpf-action'

Mathieu Xhonneux says:

====================
As of Linux 4.14, it is possible to define advanced local processing for
IPv6 packets with a Segment Routing Header through the seg6local LWT
infrastructure. This LWT implements the network programming principles
defined in the IETF "SRv6 Network Programming" draft.

The implemented operations are generic, and it would be very interesting to
be able to implement user-specific seg6local actions, without having to
modify the kernel directly. To do so, this patchset adds an End.BPF action
to seg6local, powered by some specific Segment Routing-related helpers,
which provide SR functionalities that can be applied on the packet. This
BPF hook would then allow to implement specific actions at native kernel
speed such as OAM features, advanced SR SDN policies, SRv6 actions like
Segment Routing Header (SRH) encapsulation depending on the content of
the packet, etc.

This patchset is divided in 6 patches, whose main features are :

- A new seg6local action End.BPF with the corresponding new BPF program
  type BPF_PROG_TYPE_LWT_SEG6LOCAL. Such attached BPF program can be
  passed to the LWT seg6local through netlink, the same way as the LWT
  BPF hook operates.
- 3 new BPF helpers for the seg6local BPF hook, allowing to edit/grow/
  shrink a SRH and apply on a packet some of the generic SRv6 actions.
- 1 new BPF helper for the LWT BPF IN hook, allowing to add a SRH through
  encapsulation (via IPv6 encapsulation or inlining if the packet contains
  already an IPv6 header).

As this patchset adds a new LWT BPF hook, I took into account the result
of the discussions when the LWT BPF infrastructure got merged. Hence, the
seg6local BPF hook doesn't allow write access to skb->data directly, only
the SRH can be modified through specific helpers, which ensures that the
integrity of the packet is maintained. More details are available in the
related patches messages.

The performances of this BPF hook have been assessed with the BPF JIT
enabled on an Intel Xeon X3440 processors with 4 cores and 8 threads
clocked at 2.53 GHz. No throughput losses are noted with the seg6local
BPF hook when the BPF program does nothing (440kpps). Adding a 8-bytes
TLV (1 call each to bpf_lwt_seg6_adjust_srh and bpf_lwt_seg6_store_bytes)
drops the throughput to 410kpps, and inlining a SRH via bpf_lwt_seg6_action
drops the throughput to 420kpps. All throughputs are stable.

Changelog:

v2: move the SRH integrity state from skb->cb to a per-cpu buffer
v3: - document helpers in man-page style
    - fix kbuild bugs
    - un-break BPF LWT out hook
    - bpf_push_seg6_encap is now static
    - preempt_enable is now called when the packet is dropped in
      input_action_end_bpf
v4: fix kbuild bugs when CONFIG_IPV6=m
v5: fix kbuild sparse warnings when CONFIG_IPV6=m
v6: fix skb pointers-related bugs in helpers
v7: - fix memory leak in error path of End.BPF setup
    - add freeing of BPF data in seg6_local_destroy_state
    - new enums SEG6_LOCAL_BPF_* instead of re-using ones of lwt bpf for
      netlink nested bpf attributes
    - SEG6_LOCAL_BPF_PROG attr now contains prog->aux->id when dumping
      state
====================

Signed-off-by: Daniel Borkmann <[email protected]>

selftests/bpf: test for seg6local End.BPF action

Add a new test for the seg6local End.BPF action. The following helpers
are also tested:

- bpf_lwt_push_encap within the LWT BPF IN hook
- bpf_lwt_seg6_action
- bpf_lwt_seg6_adjust_srh
- bpf_lwt_seg6_store_bytes

A chain of End.BPF actions is built. The SRH is injected through a LWT
BPF IN hook before entering this chain. Each End.BPF action validates
the previous one, otherwise the packet is dropped. The test succeeds
if the last node in the chain receives the packet and the UDP datagram
contained can be retrieved from userspace.

Signed-off-by: Mathieu Xhonneux <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

ipv6: sr: Add seg6local action End.BPF

This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.

Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.

Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.

This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.

The BPF program may return 3 types of return codes:
    - BPF_OK: the End.BPF action will look up the next destination through
             seg6_lookup_nexthop.
    - BPF_REDIRECT: if an action has been executed through the
          bpf_lwt_seg6_action helper, the BPF program should return this
          value, as the skb's destination is already set and the default
          lookup should not be performed.
    - BPF_DROP : the packet will be dropped.

Signed-off-by: Mathieu Xhonneux <[email protected]>
Acked-by: David Lebrun <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: Split lwt inout verifier structures

The new bpf_lwt_push_encap helper should only be accessible within the
LWT BPF IN hook, and not the OUT one, as this may lead to a skb under
panic.

At the moment, both LWT BPF IN and OUT share the same list of helpers,
whose calls are authorized by the verifier. This patch separates the
verifier ops for the IN and OUT hooks, and allows the IN hook to call the
bpf_lwt_push_encap helper.

This patch is also the occasion to put all lwt_*_func_proto functions
together for clarity. At the moment, socks_op_func_proto is in the middle
of lwt_inout_func_proto and lwt_xmit_func_proto.

Signed-off-by: Mathieu Xhonneux <[email protected]>
Acked-by: David Lebrun <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: Add IPv6 Segment Routing helpers

The BPF seg6local hook should be powerful enough to enable users to
implement most of the use-cases one could think of. After some thinking,
we figured out that the following actions should be possible on a SRv6
packet, requiring 3 specific helpers :
    - bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
    - bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
                               (to add/delete TLVs)
    - bpf_lwt_seg6_action: Apply some SRv6 network programming actions
                           (specifically End.X, End.T, End.B6 and
                            End.B6.Encap)

The specifications of these helpers are provided in the patch (see
include/uapi/linux/bpf.h).

The non-sensitive fields of the SRH are the following : flags, tag and
TLVs. The other fields can not be modified, to maintain the SRH
integrity. Flags, tag and TLVs can easily be modified as their validity
can be checked afterwards via seg6_validate_srh. It is not allowed to
modify the segments directly. If one wants to add segments on the path,
he should stack a new SRH using the End.B6 action via
bpf_lwt_seg6_action.

Growing, shrinking or editing TLVs via the helpers will flag the SRH as
invalid, and it will have to be re-validated before re-entering the IPv6
layer. This flag is stored in a per-CPU buffer, along with the current
header length in bytes.

Storing the SRH len in bytes in the control block is mandatory when using
bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
boundary). When adding/deleting TLVs within the BPF program, the SRH may
temporary be in an invalid state where its length cannot be rounded to 8
bytes without remainder, hence the need to store the length in bytes
separately. The caller of the BPF program can then ensure that the SRH's
final length is valid using this value. Again, a final SRH modified by a
BPF program which doesn’t respect the 8-bytes boundary will be discarded
as it will be considered as invalid.

Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
available from the LWT BPF IN hook, but not from the seg6local BPF one.
This helper allows to encapsulate a Segment Routing Header (either with
a new outer IPv6 header, or by inlining it directly in the existing IPv6
header) into a non-SRv6 packet. This helper is required if we want to
offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
as the BPF seg6local hook only works on traffic already containing a SRH.
This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
the same purpose but with a static SRH per route.

These helpers require CONFIG_IPV6=y (and not =m).

Signed-off-by: Mathieu Xhonneux <[email protected]>
Acked-by: David Lebrun <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

ipv6: sr: export function lookup_nexthop

The function lookup_nexthop is essential to implement most of the seg6local
actions. As we want to provide a BPF helper allowing to apply some of these
actions on the packet being processed, the helper should be able to call
this function, hence the need to make it public.

Moreover, if one argument is incorrect or if the next hop can not be found,
an error should be returned by the BPF helper so the BPF program can adapt
its processing of the packet (return an error, properly force the drop,
...). This patch hence makes this function return dst->error to indicate a
possible error.

Signed-off-by: Mathieu Xhonneux <[email protected]>
Acked-by: David Lebrun <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

ipv6: sr: make seg6.h includable without IPv6

include/net/seg6.h cannot be included in a source file if CONFIG_IPV6 is
not enabled:
   include/net/seg6.h: In function 'seg6_pernet':
>> include/net/seg6.h:52:14: error: 'struct net' has no member named
                                        'ipv6'; did you mean 'ipv4'?
     return net->ipv6.seg6_data;
                 ^~~~
                 ipv4

This commit makes seg6_pernet return NULL if IPv6 is not compiled, hence
allowing seg6.h to be included regardless of the configuration.

Signed-off-by: Mathieu Xhonneux <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

Btrfs: fix error handling in btrfs_truncate()

Jun Wu at Facebook reported that an internal service was seeing a return
value of 1 from ftruncate() on Btrfs in some cases. This is coming from
the NEED_TRUNCATE_BLOCK return value from btrfs_truncate_inode_items().

btrfs_truncate() uses two variables for error handling, ret and err.
When btrfs_truncate_inode_items() returns non-zero, we set err to the
return value. However, NEED_TRUNCATE_BLOCK is not an error. Make sure we
only set err if ret is an error (i.e., negative).

To reproduce the issue: mount a filesystem with -o compress-force=zstd
and the following program will encounter return value of 1 from
ftruncate:

int main(void) {
        char buf[256] = { 0 };
        int ret;
        int fd;

        fd = open("test", O_CREAT | O_WRONLY | O_TRUNC, 0666);
        if (fd == -1) {
                perror("open");
                return EXIT_FAILURE;
        }

        if (write(fd, buf, sizeof(buf)) != sizeof(buf)) {
                perror("write");
                close(fd);
                return EXIT_FAILURE;
        }

        if (fsync(fd) == -1) {
                perror("fsync");
                close(fd);
                return EXIT_FAILURE;
        }

        ret = ftruncate(fd, 128);
        if (ret) {
                printf("ftruncate() returned %d\n", ret);
                close(fd);
                return EXIT_FAILURE;
        }

        close(fd);
        return EXIT_SUCCESS;
}

Fixes: ddfae63cc8e0 ("btrfs: move btrfs_truncate_block out of trans handle")
CC: [email protected] # 4.15+
Reported-by: Jun Wu <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Merge branch 'bpf-multi-prog-improvements'

Sandipan Das says:

====================
[1] Support for bpf-to-bpf function calls in the powerpc64 JIT compiler.

[2] Provide a way for resolving function calls because of the way JITed
    images are allocated in powerpc64.

[3] Fix to get JITed instruction dumps for multi-function programs from
    the bpf system call.

[4] Fix for bpftool to show delimited multi-function JITed image dumps.

v4:
- Incorporate review comments from Jakub.
- Fix JSON output for bpftool.

v3:
- Change base tree tag to bpf-next.
- Incorporate review comments from Alexei, Daniel and Jakub.
- Make sure that the JITed image does not grow or shrink after
   the last pass due to the way the instruction sequence used
   to load a callee's address maybe optimized.
- Make additional changes to the bpf system call and bpftool to
   make multi-function JITed dumps easier to correlate.

v2:
- Incorporate review comments from Jakub.
====================

Signed-off-by: Daniel Borkmann <[email protected]>

tools: bpftool: add delimiters to multi-function JITed dumps

This splits up the contiguous JITed dump obtained via the bpf
system call into more relatable chunks for each function in
the program. If the kernel symbols corresponding to these are
known, they are printed in the header for each JIT image dump
otherwise the masked start address is printed.

Before applying this patch:

  # bpftool prog dump jited id 1

     0: push   %rbp
     1: mov    %rsp,%rbp
  ...
    70: leaveq
    71: retq
    72: push   %rbp
    73: mov    %rsp,%rbp
  ...
    dd: leaveq
    de: retq

  # bpftool -p prog dump jited id 1

  [{
          "pc": "0x0",
          "operation": "push",
          "operands": ["%rbp"
          ]
      },{
  ...
      },{
          "pc": "0x71",
          "operation": "retq",
          "operands": [null
          ]
      },{
          "pc": "0x72",
          "operation": "push",
          "operands": ["%rbp"
          ]
      },{
  ...
      },{
          "pc": "0xde",
          "operation": "retq",
          "operands": [null
          ]
      }
  ]

After applying this patch:

  # echo 0 > /proc/sys/net/core/bpf_jit_kallsyms
  # bpftool prog dump jited id 1

  0xffffffffc02c7000:
     0: push   %rbp
     1: mov    %rsp,%rbp
  ...
    70: leaveq
    71: retq

  0xffffffffc02cf000:
     0: push   %rbp
     1: mov    %rsp,%rbp
  ...
    6b: leaveq
    6c: retq

  # bpftool -p prog dump jited id 1

  [{
          "name": "0xffffffffc02c7000",
          "insns": [{
                  "pc": "0x0",
                  "operation": "push",
                  "operands": ["%rbp"
                  ]
              },{
  ...
              },{
                  "pc": "0x71",
                  "operation": "retq",
                  "operands": [null
                  ]
              }
          ]
      },{
          "name": "0xffffffffc02cf000",
          "insns": [{
                  "pc": "0x0",
                  "operation": "push",
                  "operands": ["%rbp"
                  ]
              },{
  ...
              },{
                  "pc": "0x6c",
                  "operation": "retq",
                  "operands": [null
                  ]
              }
          ]
      }
  ]

  # echo 1 > /proc/sys/net/core/bpf_jit_kallsyms
  # bpftool prog dump jited id 1

  bpf_prog_b811aab41a39ad3d_foo:
     0: push   %rbp
     1: mov    %rsp,%rbp
  ...
    70: leaveq
    71: retq

  bpf_prog_cf418ac8b67bebd9_F:
     0: push   %rbp
     1: mov    %rsp,%rbp
  ...
    6b: leaveq
    6c: retq

  # bpftool -p prog dump jited id 1

  [{
          "name": "bpf_prog_b811aab41a39ad3d_foo",
          "insns": [{
                  "pc": "0x0",
                  "operation": "push",
                  "operands": ["%rbp"
                  ]
              },{
  ...
              },{
                  "pc": "0x71",
                  "operation": "retq",
                  "operands": [null
                  ]
              }
          ]
      },{
          "name": "bpf_prog_cf418ac8b67bebd9_F",
          "insns": [{
                  "pc": "0x0",
                  "operation": "push",
                  "operands": ["%rbp"
                  ]
              },{
  ...
              },{
                  "pc": "0x6c",
                  "operation": "retq",
                  "operands": [null
                  ]
              }
          ]
      }
  ]

Signed-off-by: Sandipan Das <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

tools: bpf: sync bpf uapi header

Syncing the bpf.h uapi header with tools so that struct
bpf_prog_info has the two new fields for passing on the
JITed image lengths of each function in a multi-function
program.

Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: get JITed image lengths of functions via syscall

This adds new two new fields to struct bpf_prog_info. For
multi-function programs, these fields can be used to pass
a list of the JITed image lengths of each function for a
given program to userspace using the bpf system call with
the BPF_OBJ_GET_INFO_BY_FD command.

This can be used by userspace applications like bpftool
to split up the contiguous JITed dump, also obtained via
the system call, into more relatable chunks corresponding
to each function.

Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: fix multi-function JITed dump obtained via syscall

Currently, for multi-function programs, we cannot get the JITed
instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
command. Because of this, userspace tools such as bpftool fail
to identify a multi-function program as being JITed or not.

With the JIT enabled and the test program running, this can be
verified as follows:

  # cat /proc/sys/net/core/bpf_jit_enable
  1

Before applying this patch:

  # bpftool prog list
  1: kprobe  name foo  tag b811aab41a39ad3d  gpl
          loaded_at 2018-05-16T11:43:38+0530  uid 0
          xlated 216B  not jited  memlock 65536B
  ...

  # bpftool prog dump jited id 1
  no instructions returned

After applying this patch:

  # bpftool prog list
  1: kprobe  name foo  tag b811aab41a39ad3d  gpl
          loaded_at 2018-05-16T12:13:01+0530  uid 0
          xlated 216B  jited 308B  memlock 65536B
  ...

  # bpftool prog dump jited id 1
     0:   nop
     4:   nop
     8:   mflr    r0
     c:   std     r0,16(r1)
    10:   stdu    r1,-112(r1)
    14:   std     r31,104(r1)
    18:   addi    r31,r1,48
    1c:   li      r3,10
  ...

Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

tools: bpftool: resolve calls without using imm field

Currently, we resolve the callee's address for a JITed function
call by using the imm field of the call instruction as an offset
from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further
use this address to get the callee's kernel symbol's name.

For some architectures, such as powerpc64, the imm field is not
large enough to hold this offset. So, instead of assigning this
offset to the imm field, the verifier now assigns the subprog
id. Also, a list of kernel symbol addresses for all the JITed
functions is provided in the program info. We now use the imm
field as an index for this list to lookup a callee's symbol's
address and resolve its name.

Suggested-by: Daniel Borkmann <[email protected]>
Signed-off-by: Sandipan Das <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

tools: bpf: sync bpf uapi header

Syncing the bpf.h uapi header with tools so that struct
bpf_prog_info has the two new fields for passing on the
addresses of the kernel symbols corresponding to each
function in a program.

Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: get kernel symbol addresses via syscall

This adds new two new fields to struct bpf_prog_info. For
multi-function programs, these fields can be used to pass
a list of kernel symbol addresses for all functions in a
given program to userspace using the bpf system call with
the BPF_OBJ_GET_INFO_BY_FD command.

When bpf_jit_kallsyms is enabled, we can get the address
of the corresponding kernel symbol for a callee function
and resolve the symbol's name. The address is determined
by adding the value of the call instruction's imm field
to __bpf_call_base. This offset gets assigned to the imm
field by the verifier.

For some architectures, such as powerpc64, the imm field
is not large enough to hold this offset.

We resolve this by:

[1] Assigning the subprog id to the imm field of a call
    instruction in the verifier instead of the offset of
    the callee's symbol's address from __bpf_call_base.

[2] Determining the address of a callee's corresponding
    symbol by using the imm field as an index for the
    list of kernel symbol addresses now available from
    the program info.

Suggested-by: Daniel Borkmann <[email protected]>
Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: powerpc64: add JIT support for multi-function programs

This adds support for bpf-to-bpf function calls in the powerpc64
JIT compiler. The JIT compiler converts the bpf call instructions
to native branch instructions. After a round of the usual passes,
the start addresses of the JITed images for the callee functions
are known. Finally, to fixup the branch target addresses, we need
to perform an extra pass.

Because of the address range in which JITed images are allocated
on powerpc64, the offsets of the start addresses of these images
from __bpf_call_base are as large as 64 bits. So, for a function
call, we cannot use the imm field of the instruction to determine
the callee's address. Instead, we use the alternative method of
getting it from the list of function addresses in the auxiliary
data of the caller by using the off field as an index.

Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: powerpc64: pad function address loads with NOPs

For multi-function programs, loading the address of a callee
function to a register requires emitting instructions whose
count varies from one to five depending on the nature of the
address.

Since we come to know of the callee's address only before the
extra pass, the number of instructions required to load this
address may vary from what was previously generated. This can
make the JITed image grow or shrink.

To avoid this, we should generate a constant five-instruction
when loading function addresses by padding the optimized load
sequence with NOPs.

Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: support 64-bit offsets for bpf function calls

The imm field of a bpf instruction is a signed 32-bit integer.
For JITed bpf-to-bpf function calls, it holds the offset of the
start address of the callee's JITed image from __bpf_call_base.

For some architectures, such as powerpc64, this offset may be
as large as 64 bits and cannot be accomodated in the imm field
without truncation.

We resolve this by:

[1] Additionally using the auxiliary data of each function to
    keep a list of start addresses of the JITed images for all
    functions determined by the verifier.

[2] Retaining the subprog id inside the off field of the call
    instructions and using it to index into the list mentioned
    above and lookup the callee's address.

To make sure that the existing JIT compilers continue to work
without requiring changes, we keep the imm field as it is.

Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: btf: Avoid variable length array

Sparse warning:
kernel/bpf/btf.c:1985:34: warning: Variable length array is used.

This patch directly uses ARRAY_SIZE().

Fixes: f80442a4cd18 ("bpf: btf: Change how section is supported in btf_header")
Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

RDMA/hns: Move the location for initializing tmp_len

When posted work request, it need to compute the length of
all sges of every wr and fill it into the msg_len field of
send wqe. Thus, While posting multiple wr,
tmp_len should be reinitialized to zero.

Fixes: 8b9b8d143b46 ("RDMA/hns: Fix the endian problem for hns")
Signed-off-by: Lijun Ou <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/hns: Bugfix for cq record db for kernel

When use cq record db for kernel, it needs to set the hr_cq->db_en
to 1 and configure the dma address of record cq db of qp context.

Fixes: 86188a8810ed ("RDMA/hns: Support cq record doorbell for kernel space")
Signed-off-by: Lijun Ou <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

IB/uverbs: Fix uverbs_attr_get_obj

The err pointer comes from uverbs_attr_get, not from the uobject member,
which does not store an ERR_PTR.

Fixes: be934cca9e98 ("IB/uverbs: Add device memory registration ioctl support")
Signed-off-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>

RDMA/qedr: Fix doorbell bar mapping for dpi > 1

Each user_context receives a separate dpi value and thus a different
address on the doorbell bar. The qedr_mmap function needs to validate
the address and map the doorbell bar accordingly.
The current implementation always checked against dpi=0 doorbell range
leading to a wrong mapping for doorbell bar. (It entered an else case
that mapped the address differently). qedr_mmap should only be used
for doorbells, so the else was actually wrong in the first place.
This only has an affect on arm architecture and not an issue on a
x86 based architecture.
This lead to doorbells not occurring on arm based systems and left
applications that use more than one dpi (or several applications
run simultaneously ) to hang.

Fixes: ac1b36e55a51 ("qedr: Add support for user context verbs")
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, they are:

1) Remove obsolete nf_log tracing from nf_tables, from Florian Westphal.

2) Add support for map lookups to numgen, random and hash expressions,
   from Laura Garcia.

3) Allow to register nat hooks for iptables and nftables at the same
   time. Patchset from Florian Westpha.

4) Timeout support for rbtree sets.

5) ip6_rpfilter works needs interface for link-local addresses, from
   Vincent Bernat.

6) Add nf_ct_hook and nf_nat_hook structures and use them.

7) Do not drop packets on packets raceing to insert conntrack entries
   into hashes, this is particularly a problem in nfqueue setups.

8) Address fallout from xt_osf separation to nf_osf, patches
   from Florian Westphal and Fernando Mancera.

9) Remove reference to struct nft_af_info, which doesn't exist anymore.
   From Taehee Yoo.

This batch comes with is a conflict between 25fd386e0bc0 ("netfilter:
core: add missing __rcu annotation") in your tree and 2c205dd3981f
("netfilter: add struct nf_nat_hook and use it") coming in this batch.
This conflict can be solved by leaving the __rcu tag on
__netfilter_net_init() - added by 25fd386e0bc0 - and remove all code
related to nf_nat_decode_session_hook - which is gone after
2c205dd3981f, as described by:

diff --cc net/netfilter/core.c
index e0ae4aae96f5,206fb2c4c319..168af54db975
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@@ -611,7 -580,13 +611,8 @@@ const struct nf_conntrack_zone nf_ct_zo
  EXPORT_SYMBOL_GPL(nf_ct_zone_dflt);
  #endif /* CONFIG_NF_CONNTRACK */

- static void __net_init __netfilter_net_init(struct nf_hook_entries **e, int max)
-#ifdef CONFIG_NF_NAT_NEEDED
-void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
-EXPORT_SYMBOL(nf_nat_decode_session_hook);
-#endif
-
+ static void __net_init
+ __netfilter_net_init(struct nf_hook_entries __rcu **e, int max)
  {
   int h;

I can also merge your net-next tree into nf-next, solve the conflict and
resend the pull request if you prefer so.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'amd-xgbe-next'

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver updates 2018-05-21

The following updates are included in this driver update series:

- Fix the debug output for the max channels count
- Read (once) and save the port property registers during probe
- Remove the use of the comm_owned field
- Remove unused SFP diagnostic support indicator field
- Add ethtool --module-info support
- Add ethtool --show-ring/--set-ring support
- Update the driver in preparation for ethtool --set-channels support
- Add ethtool --show-channels/--set-channels support
- Update the driver to always perform link training in KR mode
- Advertise FEC support when using a KR re-driver
- Update the BelFuse quirk to now support SGMII
- Improve 100Mbps auto-negotiation for BelFuse parts

This patch series is based on net-next.

---

Changes since v1:
- Update the --set-channels support to the use of the combined, rx and
  tx options as specified in the ethtool man page (in other words, don't
  create combined channels based on the min of the tx and rx channels
  specified).
====================

Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Improve SFP 100Mbps auto-negotiation

After changing speed to 100Mbps as a result of auto-negotiation (AN),
some 10/100/1000Mbps SFPs indicate a successful link (no faults or loss
of signal), but cannot successfully transmit or receive data. These
SFPs required an extra auto-negotiation (AN) after the speed change in
order to operate properly. Add a quirk for these SFPs so that if the
outcome of the AN actually results in changing to a new speed, re-initiate
AN at that new speed.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Update the BelFuse quirk to support SGMII

Instead of using a quirk to make the BelFuse 1GBT-SFP06 part look like
a 1000baseX part, program the SFP PHY to support SGMII and 10/100/1000
baseT.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Advertise FEC support with the KR re-driver

When a KR re-driver is present, indicate the FEC support is available
during auto-negotiation.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Always attempt link training in KR mode

Link training is always attempted when in KR mode, but the code is
structured to check if link training has been enabled before attempting
to perform it. Since that check will always be true, simplify the code
to always enable and start link training during KR auto-negotiation.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Add ethtool show/set channels support

Add ethtool support to show and set the device channel configuration.
Changing the channel configuration will result in a device restart.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Prepare for ethtool set-channel support

In order to support being able to dynamically set/change the number of
Rx and Tx channels, update the code to:
- Move alloc and free of device memory into callable functions
- Move setting of the real number of Rx and Tx channels to device startup
- Move mapping of the RSS channels to device startup

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Add ethtool show/set ring parameter support

Add ethtool support to show and set the number of the Rx and Tx ring
descriptors. Changing the ring configuration will result in a device
restart.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Add ethtool support to retrieve SFP module info

Add support to get SFP module information using ethtool.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Remove field that indicates SFP diagnostic support

The driver currently sets an indication of whether the SFP supports, and
that the driver can obtain, diagnostics data. This isn't currently used
by the driver and the logic to set this indicator is flawed because the
field is cleared each time the SFP is checked and only set when a new SFP
is detected. Remove this field and the logic supporting it.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Remove use of comm_owned field

The comm_owned field can hide logic where double locking is attempted
and prevent multiple threads for the same device from accessing the
mutex properly. Remove the comm_owned field and use the mutex API
exclusively for gaining ownership. The current driver has been audited
and is obtaining communications ownership properly.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Read and save the port property registers during probe

Read and save the port property registers once during the device probe
and then use the saved values as they are needed.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Fix debug output of max channel counts

A debug output print statement uses the wrong variable to output the
maximum Rx channel count (cut and paste error, basically). Fix the
statement to use the proper variable.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'smc-next'

Ursula Braun says:

====================
patches 2018-05-23

here are more smc-patches for net-next:

Patch 1 fixes an ioctl problem detected by syzbot.

Patch 2 improves smc_lgr_list locking in case of abnormal link
group termination. If you want to receive a version for the net-tree,
please let me know. It would look somewhat different, since the port
terminate code has been moved to smc_core.c on net-next.

Patch 3 enables SMC to deal with urgent data.

Patch 4 is a minor improvement to avoid out-of-sync linkgroups
between 2 peers.
====================

Signed-off-by: David S. Miller <[email protected]>

net/smc: longer delay when freeing client link groups

Client link group creation always follows the server linkgroup creation.
If peer creates a new server link group, client has to create a new
client link group. If peer reuses a server link group for a new
connection, client has to reuse its client link group as well. To
avoid out-of-sync conditions for link groups a longer delay for
for client link group removal is defined to make sure this link group
still exists, once the peer decides to reuse a server link group.

Currently the client link group delay time is just 10 jiffies larger
than the server link group delay time. This patch increases the delay
difference to 10 seconds to have a better protection against
out-of-sync link groups.

Signed-off-by: Ursula Braun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/smc: urgent data support

Add support for out of band data send and receive.

Signed-off-by: Stefan Raspl <[email protected]>
Signed-off-by: Ursula Braun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/smc: lock smc_lgr_list in port_terminate()

Currently, smc_port_terminate() is not holding the lock of the lgr list
while it is traversing the list. This patch adds locking to this
function and changes smc_lgr_terminate() accordingly.

Signed-off-by: Hans Wippel <[email protected]>
Signed-off-by: Ursula Braun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/smc: return 0 for ioctl calls in states INIT and CLOSED

A connected SMC-socket contains addresses of descriptors for the
send buffer and the rmb (receive buffer). Fields of these descriptors
are used to determine the answer for certain ioctl requests.
Add extra handling for unconnected SMC socket states without valid
buffer descriptor addresses.

Signed-off-by: Ursula Braun <[email protected]>
Reported-by: [email protected]
Signed-off-by: David S. Miller <[email protected]>

cxgb4: do L1 config when module is inserted

trigger an L1 configure operation when a transceiver module
is inserted in order to cause current "sticky" options like
Requested Forward Error Correction to be reapplied.

Signed-off-by: Casey Leedom <[email protected]>
Signed-off-by: Ganesh Goudar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4: change the port capability bits definition

MDI Port Capabilities bit definitions were inconsistent with
regard to the MDI enum values. 2 bits used to define MDI in
the port capabilities are not really separable, it's a 2-bit
field with 4 different values. Change the port capability bit
definitions to be "AUTO" and "STRAIGHT" in order to get them
to line up with the enum's.

Signed-off-by: Casey Leedom <[email protected]>
Signed-off-by: Ganesh Goudar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'mac80211-next-for-davem-2018-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

For this round, we have various things all over the place, notably
* a fix for a race in aggregation, which I want to let
bake for a bit longer before sending to stable
* some new statistics (ACK RSSI, TXQ)
* TXQ configuration
* preparations for HE, particularly radiotap
* replace confusing "country IE" by "country element" since it's
not referring to Ireland

Note that I merged net-next to get a fix from mac80211 that got
there via net, to apply one patch that would otherwise conflict.
====================

Signed-off-by: David S. Miller <[email protected]>

net/mlx4: Fix irq-unsafe spinlock usage

spin_lock/unlock was used instead of spin_un/lock_irq
in a procedure used in process space, on a spinlock
which can be grabbed in an interrupt.

This caused the stack trace below to be displayed (on kernel
4.17.0-rc1 compiled with Lock Debugging enabled):

[  154.661474] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
[  154.668909] 4.17.0-rc1-rdma_rc_mlx+ #3 Tainted: G          I
[  154.675856] -----------------------------------------------------
[  154.682706] modprobe/10159 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[  154.690254] 00000000f3b0e495 (&(&qp_table->lock)->rlock){+.+.}, at: mlx4_qp_remove+0x20/0x50 [mlx4_core]
[  154.700927]
and this task is already holding:
[  154.707461] 0000000094373b5d (&(&cq->lock)->rlock/1){....}, at: destroy_qp_common+0x111/0x560 [mlx4_ib]
[  154.718028] which would create a new lock dependency:
[  154.723705]  (&(&cq->lock)->rlock/1){....} -> (&(&qp_table->lock)->rlock){+.+.}
[  154.731922]
but this new dependency connects a SOFTIRQ-irq-safe lock:
[  154.740798]  (&(&cq->lock)->rlock){..-.}
[  154.740800]
... which became SOFTIRQ-irq-safe at:
[  154.752163]   _raw_spin_lock_irqsave+0x3e/0x50
[  154.757163]   mlx4_ib_poll_cq+0x36/0x900 [mlx4_ib]
[  154.762554]   ipoib_tx_poll+0x4a/0xf0 [ib_ipoib]
...
to a SOFTIRQ-irq-unsafe lock:
[  154.815603]  (&(&qp_table->lock)->rlock){+.+.}
[  154.815604]
... which became SOFTIRQ-irq-unsafe at:
[  154.827718] ...
[  154.827720]   _raw_spin_lock+0x35/0x50
[  154.833912]   mlx4_qp_lookup+0x1e/0x50 [mlx4_core]
[  154.839302]   mlx4_flow_attach+0x3f/0x3d0 [mlx4_core]

Since mlx4_qp_lookup() is called only in process space, we can
simply replace the spin_un/lock calls with spin_un/lock_irq calls.

Fixes: 6dc06c08bef1 ("net/mlx4: Fix the check in attaching steering rules")
Signed-off-by: Jack Morgenstein <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'qca8k-QCA8334-switch-support'

Michal Vokáč says:

====================
Add support for QCA8334 switch

This series basically adds support for a QCA8334 ethernet switch to the
qca8k driver. It is a four-port variant of the already supported seven
port QCA8337. Register map is the same for the whole familly and all chips
have the same device ID.

Major part of this series enhances the CPU port setting. Currently the CPU
port is not set to any sensible defaults compatible with the xGMII
interface. This series forces the CPU port to its maximum bandwidth and
also allows to adjust the new defaults using fixed-link device tree
sub-node.

Alongside these changes I fixed two checkpatch warnings regarding SPDX and
redundant parentheses.

Changes in v3:
- Rebased on latest net-next/master.
- Corrected fixed-link documentation.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: qca8k: Remove redundant parentheses

Fix warning reported by checkpatch.

Signed-off-by: Michal Vokáč <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: qca8k: Replace GPL boilerplate by SPDX

Replace the GPLv2 license boilerplate with the SPDX license identifier.

Signed-off-by: Michal Vokáč <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: qca8k: Allow overwriting CPU port setting

Implement adjust_link function that allows to overwrite default CPU port
setting using fixed-link device tree subnode.

Signed-off-by: Michal Vokáč <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: qca8k: Force CPU port to its highest bandwidth

By default autonegotiation is enabled to configure MAC on all ports.
For the CPU port autonegotiation can not be used so we need to set
some sensible defaults manually.

This patch forces the default setting of the CPU port to 1000Mbps/full
duplex which is the chip maximum capability.

Also correct size of the bit field used to configure link speed.

Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family")
Signed-off-by: Michal Vokáč <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: qca8k: Enable RXMAC when bringing up a port

When a port is brought up/down do not enable/disable only the TXMAC
but the RXMAC as well. This is essential for the CPU port to work.

Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family")
Signed-off-by: Michal Vokáč <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: qca8k: Add support for QCA8334 switch

Add support for the four-port variant of the Qualcomm QCA833x switch.

Signed-off-by: Michal Vokáč <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: qca8k: Add QCA8334 binding documentation

Add support for the four-port variant of the Qualcomm QCA833x switch.

The CPU port default link settings can be reconfigured using
a fixed-link sub-node.

Signed-off-by: Michal Vokáč <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4: Add new T6 device ids

Add 0x6088 and 0x6089 device ids for new T6 cards.

Signed-off-by: Ganesh Goudar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: broadcom: Fix bcm_write_exp()

On newer PHYs, we need to select the expansion register to write with
setting bits [11:8] to 0xf. This was done correctly by bcm7xxx.c prior
to being migrated to generic code under bcm-phy-lib.c which
unfortunately used the older implementation from the BCM54xx days.

Fix this by creating an inline stub: bcm_write_exp_sel() which adds the
correct value (MII_BCM54XX_EXP_SEL_ER) and update both the Cygnus PHY
and BCM7xxx PHY drivers which require setting these bits.

broadcom.c is unchanged because some PHYs even use a different selector
method, so let them specify it directly (e.g: SerDes secondary selector).

Fixes: a1cba5613edf ("net: phy: Add Broadcom phy library for common interfaces")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: uevent filtering

Recent discussions around uevent filtering (cf. net-next commit [1], [2],
and [3] and discussions in [4], [5], and [6]) have shown that the semantics
around uevent filtering where not well understood.
Now that we have settled - at least for the moment - how uevent filtering
should look like let's add some selftests to ensure we don't regress
anything in the future.
Note, the semantics of uevent filtering are described in detail in my
commit message to [2] so I won't repeat them here.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=90d52d4fd82007005125d9a8d2d560a1ca059b9d
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a3498436b3a0f8ec289e6847e1de40b4123e1639
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=26045a7b14bc7a5455e411d820110f66557d6589
[4]: https://lkml.org/lkml/2018/4/4/739
[5]: https://lkml.org/lkml/2018/4/26/767
[6]: https://lkml.org/lkml/2018/4/26/738

Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: broadcom: Fix auxiliary control register reads

We are currently doing auxiliary control register reads with the shadow
register value 0b111 (0x7) which incidentally is also the selector value
that should be present in bits [2:0]. Fix this by using the appropriate
selector mask which is defined (MII_BCM54XX_AUXCTL_SHDWSEL_MASK).

This does not have a functional impact yet because we always access the
MII_BCM54XX_AUXCTL_SHDWSEL_MISC (0x7) register in the current code.
This might change at some point though.

Fixes: 5b4e29005123 ("net: phy: broadcom: add bcm54xx_auxctl_read")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'fib-rule-selftest'

Roopa Prabhu says:

====================
fib rule selftest

This series adds a new test to test fib rules.
ip route get is used to test fib rule matches.
This series also extends ip route get to match on
sport and dport to test recent support of sport
and dport fib rule match.

v2 - address ido's commemt to make sport dport
ip route get to work correctly for input route
get. I don't support ip route get on ip-proto match yet.
ip route get creates a udp packet and i have left
it at that. We could extend ip route get to support
a few ip proto matches in followup patches.

v3 - Support ip_proto (only tcp and udp) match in getroute.
dropped printing of new match attrs in ip route get,
because ipv6 does not print it. And ipv6 currrently shares
the dump api with ipv6 notify and its better to not add them
to the notify api. dropped it to keep the api consistent between
ipv4 and ipv6 (though uid is already printed in the ipv4 case).
If we need it, both ipv4 and ipv6 can be enhanced to provide
a separate get api. Moved skb creation for ipv4 to a separate func.

v4 - drop separate skb for netlink and fix concerns around rcu and netlink
     reply (as pointed out by DaveM). I now try to reset the skb after the route
     lookup and before the netlink send (testing shows this is ok. More eyes and
     any feedback here will be helpful)

v5 - dropped RTA_TABLE ipv4_rtm_policy update from this series and posted
     it separately for net (feedback from Eric)
====================

Signed-off-by: David S. Miller <[email protected]>

selftests: net: initial fib rule tests

This adds a first set of tests for fib rule match/action for
ipv4 and ipv6. Initial tests only cover action lookup table.
can be extended to cover other actions in the future.
Uses ip route get to validate the rule lookup.

Signed-off-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: support sport, dport and ip_proto in RTM_GETROUTE

This is a followup to fib6 rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv4: support sport, dport and ip_proto in RTM_GETROUTE

This is a followup to fib rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy

Signed-off-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message

Trivial fix to spelling mistake in mlx4_dbg debug message and also
change the phrasing of the message so that is is more readable

Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

hv_netvsc: Add handlers for ethtool get/set msg level

The handlers for ethtool get/set msg level are missing from netvsc.
This patch adds them.

Signed-off-by: Haiyang Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Only do H_EOI for mobility events

When enabling the sub-CRQ IRQ a previous update sent a H_EOI prior
to the enablement to clear any pending interrupts that may be present
across a partition migration. This fixed a firmware bug where a
migration could erroneously indicate that a H_EOI was pending.

The H_EOI should only be sent when enabling during a mobility
event though. Doing so at other time could wrong and can produce
extra driver output when IRQs are enabled when doing TX completion.

Signed-off-by: Nathan Fontenot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: vxge: fix spelling mistake in macro VXGE_HW_ERR_PRIVILAGED_OPEARATION

Rename VXGE_HW_ERR_PRIVILAGED_OPEARATION to VXGE_HW_ERR_PRIVILEGED_OPERATION
to fix spelling mistake.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'udp-gso-fixes'

Willem de Bruijn says:

====================
udp gso fixes

A few small fixes:
- disallow segmentation with XFRM
- do not leak gso packets into the ingress path

Changes
  v1 -> v2
  - fix build failure in team.c
  - drop scatter-gather fix:
      this is now fixed by commit 113f99c33585 ("net: test tailroom
      before appending to linear skb"). After this patch gso skbs are
      built non-linear regardless of NETIF_F_SG and skb_segment builds
      linear segs.
====================

Signed-off-by: David S. Miller <[email protected]>

gso: limit udp gso to egress-only virtual devices

Until the udp receive stack supports large packets (UDP GRO), GSO
packets must not loop from the egress to the ingress path.

Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual
devices through NETIF_F_GSO_ENCAP_ALL as this included devices that
may loop packets, such as veth and macvlan.

Instead add it to specific devices that forward to another device's
egress path, bonding and team.

Fixes: 83aa025f535f ("udp: add gso support to virtual devices")
CC: Alexander Duyck <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

udp: exclude gso from xfrm paths

UDP GSO delays final datagram construction to the GSO layer. This
conflicts with protocol transformations.

Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
CC: Michal Kubecek <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'wireless-drivers-for-davem-2018-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.17

Hopefully the last fixes for 4.17. ssb is again causing problems so we
had to revert a commit and fix it better. Also a small fix to bcma and
some MAINTAINERS file updates.

ssb

* fix regression with all module PCI cards, for example using b43 and
b44 drivers

* try again fixing a MIPS linker error

bcma

* fix truncated info log messages
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-sfp-small-improvements'

Antoine Tenart says:

====================
net: sfp: small improvements

A small series of patches improving the SFP support by adding a warning
when no Tx disable pin is available, and making the i2c-bus property
mandatory.

Thanks!
Antoine

Since v1:
  - Removed the patch fixing the sfp driver when no i2c bus was described.
  - Made two new patches to make the i2c-bus property mandatory for sfp modules.

Since the phylink series:
  - s/-EOPNOTSUPP/-ENODEV/ in patch 1/2.
  - I added the acked-by tag in patch 2/2.
====================

Signed-off-by: David S. Miller <[email protected]>

Documentation/bindings: net: the sfp i2c-bus property is now mandatory

The i2c-bus property for sfp modules was made mandatory. Update the
documentation to keep it in sync with the driver's behaviour.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: sfp: make the i2c-bus dt property mandatory

This patch makes the i2c-bus property mandatory when using a device
tree. If the sfp i2c bus isn't described it's impossible to guess the
protocol to use for a given module, and the sfp module would then not
work in most cases.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: sfp: warn the user when no tx_disable pin is available

In case no Tx disable pin is available the SFP modules will always be
emitting. This could be an issue when using modules using laser as their
light source as we would have no way to disable it when the fiber is
removed. This patch adds a warning when registering an SFP cage which do
not have its tx_disable pin wired or available.

Signed-off-by: Antoine Tenart <[email protected]>
Acked-by: Russell King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tuntap: correctly set SOCKWQ_ASYNC_NOSPACE

When link is down, writes to the device might fail with
-EIO. Userspace needs an indication when the status is resolved. As a
fix, tun_net_open() attempts to wake up writers - but that is only
effective if SOCKWQ_ASYNC_NOSPACE has been set in the past. This is
not the case of vhost_net which only poll for EPOLLOUT after it meets
errors during sendmsg().

This patch fixes this by making sure SOCKWQ_ASYNC_NOSPACE is set when
socket is not writable or device is down to guarantee EPOLLOUT will be
raised in either tun_chr_poll() or tun_sock_write_space() after device
is up.

Cc: Hannes Frederic Sowa <[email protected]>
Cc: Eric Dumazet <[email protected]>
Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'nfp-abm-add-basic-support-for-advanced-buffering-NIC'

Jakub Kicinski says:

====================
nfp: abm: add basic support for advanced buffering NIC

This series lays groundwork for advanced buffer management NIC feature.
It makes necessary NFP core changes, spawns representors and adds devlink
glue. Following series will add the actual buffering configuration (patch
series size limit).

First three patches add support for configuring NFP buffer pools via a
mailbox. The existing devlink APIs are used for the purpose.

Third patch allows us to perform small reads from the NFP memory.

The rest of the patch set adds eswitch mode change support and makes
the driver spawn appropriate representors.
====================

Signed-off-by: David S. Miller <[email protected]>

nfp: assign vNIC id as phys_port_name of vNICs which are not ports

When NFP is modelled as a switch we assign phys_port_name to respective
port(representor )s:

vNIC0 - | - PF port (pf%d) MAC/PHY (p%d[s%d]) - |E==

In most cases there is only one vNIC for communication with the switch.
If there is more than one we need to be able to identify them. Use %d
as phys_port_name of the vNICs.

We don't have to pass ID to nfp_net_debugfs_vnic_add() separately any
more.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Dirk van der Merwe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nfp: use split in naming of PCIe PF ports

PCI PFs can host more than one logical endpoint.  In NFP terms
this means having more than one vNIC for PCIe PF.  The vNICs
are usually corresponding 1:1 to Ethernet ports.  In core NIC
we use the legacy idea of vNIC *being* the Ethernet port,
hence netdevs put pX(sY) in their phys_port_name, like Ethernet
ports would.  When ASIC ports are fully represented we need to
be able to name different PCIe PF ports, too.  Use a scheme
similar to Ethernet ports - pfXsY, for PCIe PF number X,
sub-port Y.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Dirk van der Merwe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nfp: abm: force Ethternet port up

Current control firmware does not cater too well to multi-host
applications. There is no way to check which hosts are up or
otherwise negotiate what the state of the external port (the
Ethernet port) should be. Make sure the link is up when driver
loads, and don't take it down when Ethernet port netdev is
closed.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Dirk van der Merwe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nfp: abm: spawn port netdevs

To configure buffering points we need full set of netdevs:

ASIC

user netdev -- | -- PCIe port MAC port -- | --

Configuring egrees qdiscs on user netdev configures standard
Linux TC software qdiscs, configuring PCIe port qdiscs will
provide a way of setting ASIC queuing parameters for PCIe block.
MAC port netdev egress qdiscs correspond to ASIC MAC Traffic
Manager block.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Dirk van der Merwe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nfp: add devlink_eswitch_mode_set callback

Our previous apps all assumed to use only one eswitch mode (legacy
or switchdev) without the ability to change it. ABM NIC will
want to support the switch so plumb devlink_eswitch_mode_set through.
The devlink_eswitch_mode_set is expected to spawn representors and
potentially devlink ports so it's called under big devlink lock and
pf->lock.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Dirk van der Merwe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

devlink: don't take instance lock around eswitch mode set

Changing switch mode may want to register and unregister devlink
ports. Therefore similarly to DEVLINK_CMD_PORT_SPLIT/UNSPLIT it
should not take the instance lock. Drivers don't depend on existing
locking since it's a very recent addition.

Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nfp: add app pointer to port representors

nfp_apps can currently associate their structures with vNICs but
not representors. Add app priv pointer to representors as well.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Dirk van der Merwe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nfp: abm: create project-specific vNIC structure

ABM NIC requires more complex vNIC handling, allocate
per-vNIC structure. Find out RX queue base and PCI PF id.
There will be multiple PFs sharing the same MAC port, therefore
the MAC address assigned to the vNIC must be looked up in the
HWInfo database.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Dirk van der Merwe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nfp: abm: add initial active buffer management NIC skeleton

Add a very rudimentary active buffer management NIC support.
For now it's like a core NIC without SR-IOV support. Next
commits will extend its functionality.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Dirk van der Merwe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>