Git Repo - linux.git/log

taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte

The taprio budget needs to be adapted at runtime according to interface
link speed. But that handling is problematic.

For one thing, installing a qdisc on an interface that doesn't have
carrier is not illegal. But taprio prints the following stack trace:

[   31.851373] ------------[ cut here ]------------
[   31.856024] WARNING: CPU: 1 PID: 207 at net/sched/sch_taprio.c:481 taprio_dequeue+0x1a8/0x2d4
[   31.864566] taprio: dequeue() called with unknown picos per byte.
[   31.864570] Modules linked in:
[   31.873701] CPU: 1 PID: 207 Comm: tc Not tainted 5.3.0-rc5-01199-g8838fe023cd6 #1689
[   31.881398] Hardware name: Freescale LS1021A
[   31.885661] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14)
[   31.893368] [<c030d8cc>] (show_stack) from [<c10ac958>] (dump_stack+0xb4/0xc8)
[   31.900555] [<c10ac958>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8)
[   31.907395] [<c0349d04>] (__warn) from [<c0349d64>] (warn_slowpath_fmt+0x48/0x6c)
[   31.914841] [<c0349d64>] (warn_slowpath_fmt) from [<c0f38db4>] (taprio_dequeue+0x1a8/0x2d4)
[   31.923150] [<c0f38db4>] (taprio_dequeue) from [<c0f227b0>] (__qdisc_run+0x90/0x61c)
[   31.930856] [<c0f227b0>] (__qdisc_run) from [<c0ec82ac>] (net_tx_action+0x12c/0x2bc)
[   31.938560] [<c0ec82ac>] (net_tx_action) from [<c0302298>] (__do_softirq+0x130/0x3c8)
[   31.946350] [<c0302298>] (__do_softirq) from [<c03502a0>] (irq_exit+0xbc/0xd8)
[   31.953536] [<c03502a0>] (irq_exit) from [<c03a4808>] (__handle_domain_irq+0x60/0xb4)
[   31.961328] [<c03a4808>] (__handle_domain_irq) from [<c0754478>] (gic_handle_irq+0x58/0x9c)
[   31.969638] [<c0754478>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90)
[   31.977076] Exception stack(0xe8167b20 to 0xe8167b68)
[   31.982100] 7b20: e9d4bd80 00000cc0 000000cf 00000000 e9d4bd80 c1f38958 00000cc0 c1f38960
[   31.990234] 7b40: 00000001 000000cf 00000004 e9dc0800 00000000 e8167b70 c0f478ec c0f46d94
[   31.998363] 7b60: 60070013 ffffffff
[   32.001833] [<c0301a8c>] (__irq_svc) from [<c0f46d94>] (netlink_trim+0x18/0xd8)
[   32.009104] [<c0f46d94>] (netlink_trim) from [<c0f478ec>] (netlink_broadcast_filtered+0x34/0x414)
[   32.017930] [<c0f478ec>] (netlink_broadcast_filtered) from [<c0f47cec>] (netlink_broadcast+0x20/0x28)
[   32.027102] [<c0f47cec>] (netlink_broadcast) from [<c0eea378>] (rtnetlink_send+0x34/0x88)
[   32.035238] [<c0eea378>] (rtnetlink_send) from [<c0f25890>] (notify_and_destroy+0x2c/0x44)
[   32.043461] [<c0f25890>] (notify_and_destroy) from [<c0f25e08>] (qdisc_graft+0x398/0x470)
[   32.051595] [<c0f25e08>] (qdisc_graft) from [<c0f27a00>] (tc_modify_qdisc+0x3a4/0x724)
[   32.059470] [<c0f27a00>] (tc_modify_qdisc) from [<c0ee4c84>] (rtnetlink_rcv_msg+0x260/0x2ec)
[   32.067864] [<c0ee4c84>] (rtnetlink_rcv_msg) from [<c0f4a988>] (netlink_rcv_skb+0xb8/0x110)
[   32.076172] [<c0f4a988>] (netlink_rcv_skb) from [<c0f4a170>] (netlink_unicast+0x1b4/0x22c)
[   32.084392] [<c0f4a170>] (netlink_unicast) from [<c0f4a5e4>] (netlink_sendmsg+0x33c/0x380)
[   32.092614] [<c0f4a5e4>] (netlink_sendmsg) from [<c0ea9f40>] (sock_sendmsg+0x14/0x24)
[   32.100403] [<c0ea9f40>] (sock_sendmsg) from [<c0eaa780>] (___sys_sendmsg+0x214/0x228)
[   32.108279] [<c0eaa780>] (___sys_sendmsg) from [<c0eabad0>] (__sys_sendmsg+0x50/0x8c)
[   32.116068] [<c0eabad0>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54)
[   32.123938] Exception stack(0xe8167fa8 to 0xe8167ff0)
[   32.128960] 7fa0:                   b6fa68c8 000000f8 00000003 bea142d0 00000000 00000000
[   32.137093] 7fc0: b6fa68c8 000000f8 0052154c 00000128 5d6468a2 00000000 00000028 00558c9c
[   32.145224] 7fe0: 00000070 bea14278 00530d64 b6e17e64
[   32.150659] ---[ end trace 2139c9827c3e5177 ]---

This happens because the qdisc ->dequeue callback gets called. Which
again is not illegal, the qdisc will dequeue even when the interface is
up but doesn't have carrier (and hence SPEED_UNKNOWN), and the frames
will be dropped further down the stack in dev_direct_xmit().

And, at the end of the day, for what? For calculating the initial budget
of an interface which is non-operational at the moment and where frames
will get dropped anyway.

So if we can't figure out the link speed, default to SPEED_10 and move
along. We can also remove the runtime check now.

Cc: Leandro Dorileo <[email protected]>
Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation")
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

taprio: Fix kernel panic in taprio_destroy

taprio_init may fail earlier than this line:

list_add(&q->taprio_list, &taprio_list);

i.e. due to the net device not being multi queue.

Attempting to remove q from the global taprio_list when it is not part
of it will result in a kernel panic.

Fix it by matching list_add and list_del better to one another in the
order of operations. This way we can keep the deletion unconditional
and with lower complexity - O(1).

Cc: Leandro Dorileo <[email protected]>
Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation")
Signed-off-by: Vladimir Oltean <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'qed-Enhancements'

Sudarsana Reddy Kalluru says:

====================
qed*: Enhancements.

The patch series adds couple of enhancements to qed/qede drivers.
  - Support for dumping the config id attributes via ethtool -w/W.
  - Support for dumping the GRC data of required memory regions using
    ethtool -w/W interfaces.

Patch (1) adds driver APIs for reading the config id attributes.
Patch (2) adds ethtool support for dumping the config id attributes.
Patch (3) adds support for configuring the GRC dump config flags.
Patch (4) adds ethtool support for dumping the grc dump.
====================

Signed-off-by: David S. Miller <[email protected]>

qede: Add support for dumping the grc data.

This patch adds driver support for configuring grc dump config flags, and
dumping the grc data via ethtool get/set-dump interfaces.

Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Add APIs for configuring grc dump config flags.

The patch adds driver support for configuring the grc dump config flags.

Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qede: Add support for reading the config id attributes.

Add driver support for dumping the config id attributes via ethtool dump
interfaces.

Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Add APIs for reading config id attributes.

The patch adds driver support for reading the config id attributes from NVM
flash partition.

Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'Dynamic-toggling-of-vlan_filtering-for-SJA1105-DSA'

Vladimir Oltean says:

====================
Dynamic toggling of vlan_filtering for SJA1105 DSA

This patchset addresses a limitation in dsa_8021q where this sequence of
commands was causing the switch to stop forwarding traffic:

  ip link add name br0 type bridge vlan_filtering 0
  ip link set dev swp2 master br0
  echo 1 > /sys/class/net/br0/bridge/vlan_filtering
  echo 0 > /sys/class/net/br0/bridge/vlan_filtering

The issue has to do with the VLAN table manipulations that dsa_8021q
does without notifying the bridge layer. The solution is to always
restore the VLANs that the bridge knows about, when disabling tagging.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering

The bridge core assumes that enabling/disabling vlan_filtering will
translate into the simple toggling of a flag for switchdev drivers.

That is clearly not the case for sja1105, which alters the VLAN table
and the pvids in order to obtain port separation in standalone mode.

There are 2 parts to the issue.

First, tag_8021q changes the pvid to a unique per-port rx_vid for frame
identification. But we need to disable tag_8021q when vlan_filtering
kicks in, and at that point, the VLAN configured as pvid will have to be
removed from the filtering table of the ports. With an invalid pvid, the
ports will drop all traffic. Since the bridge will not call any vlan
operation through switchdev after enabling vlan_filtering, we need to
ensure we're in a functional state ourselves. Hence read the pvid that
the bridge is aware of, and program that into our ports.

Secondly, tag_8021q uses the 1024-3071 range privately in
vlan_filtering=0 mode. Had the user installed one of these VLANs during
a previous vlan_filtering=1 session, then upon the next tag_8021q
cleanup for vlan_filtering to kick in again, VLANs in that range will
get deleted unconditionally, hence breaking user expectation. So when
deleting the VLANs, check if the bridge had knowledge about them, and if
it did, re-apply the settings. Wrap this logic inside a
dsa_8021q_vid_apply helper function to reduce code duplication.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Vivien Didelot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: bridge: Populate the pvid flag in br_vlan_get_info

Currently this simplified code snippet fails:

br_vlan_get_pvid(netdev, &pvid);
br_vlan_get_info(netdev, pvid, &vinfo);
ASSERT(!(vinfo.flags & BRIDGE_VLAN_INFO_PVID));

It is intuitive that the pvid of a netdevice should have the
BRIDGE_VLAN_INFO_PVID flag set.

However I can't seem to pinpoint a commit where this behavior was
introduced. It seems like it's been like that since forever.

At a first glance it would make more sense to just handle the
BRIDGE_VLAN_INFO_PVID flag in __vlan_add_flags. However, as Nikolay
explains:

  There are a few reasons why we don't do it, most importantly because
  we need to have only one visible pvid at any single time, even if it's
  stale - it must be just one. Right now that rule will not be violated
  by this change, but people will try using this flag and could see two
  pvids simultaneously. You can see that the pvid code is even using
  memory barriers to propagate the new value faster and everywhere the
  pvid is read only once.  That is the reason the flag is set
  dynamically when dumping entries, too.  A second (weaker) argument
  against would be given the above we don't want another way to do the
  same thing, specifically if it can provide us with two pvids (e.g. if
  walking the vlan list) or if it can provide us with a pvid different
  from the one set in the vg. [Obviously, I'm talking about RCU
  pvid/vlan use cases similar to the dumps.  The locked cases are fine.
  I would like to avoid explaining why this shouldn't be relied upon
  without locking]

So instead of introducing the above change and making sure of the pvid
uniqueness under RCU, simply dynamically populate the pvid flag in
br_vlan_get_info().

Signed-off-by: Vladimir Oltean <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: microchip: fill regmap_config name

Use the register value width as the regmap_config name to prevent the
following error when the second and third regmap_configs are
initialized.
"debugfs: Directory '${bus-id}' with parent 'regmap' already present!"

Signed-off-by: George McCollister <[email protected]>
Reviewed-by: Marek Vasut <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'batadv-net-for-davem-20190830' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are two batman-adv bugfixes:

- Fix OGM and OGMv2 header read boundary check,
by Sven Eckelmann (2 patches)
====================

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'batadv-next-for-davem-20190830' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This maintenance patchset includes the following patches:

- Add Sven to the MAINTAINERS file, by Simon Wunderlich
====================

Signed-off-by: David S. Miller <[email protected]>

tools/power turbostat: update version number

Today is 19.08.31, at least in some parts of the world.

Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: Add support for Hygon Fam 18h (Dhyana) RAPL

Commit 9392bd98bba760be96ee ("tools/power turbostat: Add support for AMD
Fam 17h (Zen) RAPL") and the commit 3316f99a9f1b68c578c5 ("tools/power
turbostat: Also read package power on AMD F17h (Zen)") add AMD Fam 17h
RAPL support.

Hygon Family 18h(Dhyana) support RAPL in bit 14 of CPUID 0x80000007 EDX,
and has MSRs RAPL_PWR_UNIT/CORE_ENERGY_STAT/PKG_ENERGY_STAT. So add Hygon
Dhyana Family 18h support for RAPL.

Already tested on Hygon multi-node systems and it shows correct per-core
energy usage and the total package power.

Signed-off-by: Pu Wen <[email protected]>
Reviewed-by: Calvin Walton <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: Fix caller parameter of get_tdp_amd()

Commit 9392bd98bba760be96ee ("tools/power turbostat: Add support for AMD
Fam 17h (Zen) RAPL") add a function get_tdp_amd(), the parameter is CPU
family. But the rapl_probe_amd() function use wrong model parameter.
Fix the wrong caller parameter of get_tdp_amd() to use family.

Cc: <[email protected]> # v5.1+
Signed-off-by: Pu Wen <[email protected]>
Reviewed-by: Calvin Walton <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: Fix CPU%C1 display value

In some case C1% will be wrong value, when platform doesn't have MSR for
C1 residency.

For example:
Core    CPU     CPU%c1
-       -       100.00
0       0       100.00
0       2       100.00
1       1       100.00
1       3       100.00

But adding Busy% will fix this
Core    CPU     Busy%   CPU%c1
-       -       99.77   0.23
0       0       99.77   0.23
0       2       99.77   0.23
1       1       99.77   0.23
1       3       99.77   0.23

This issue can be reproduced on most of the recent systems including
Broadwell, Skylake and later.

This is because if we don't select Busy% or Avg_MHz or Bzy_MHz then
mperf value will not be read from MSR, so it will be 0. But this
is required for C1% calculation when MSR for C1 residency is not present.
Same is true for C3, C6 and C7 column selection.

So add another define DO_BIC_READ(), which doesn't depend on user
column selection and use for mperf, C3, C6 and C7 related counters.
So when there is no platform support for C1 residency counters,
we still read these counters, if the CPU has support and user selected
display of CPU%c1.

Signed-off-by: Srinivas Pandruvada <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: do not enforce 1ms

Turbostat works by taking a snapshot of counters, sleeping, taking another
snapshot, calculating deltas, and printing out the table.

The sleep time is controlled via -i option or by user sending a signal or a
character to stdin. In the latter case, turbostat always adds 1 ms
sleep before it reads the counters, in order to avoid larger imprecisions
in the results in prints.

While the 1 ms delay may be a good idea for a "dumb" user, it is a
problem for an "aware" user. I do thousands and thousands of measurements
over a short period of time (like 2ms), and turbostat unconditionally adds
a 1ms to my interval, so I cannot get what I really need.

This patch removes the unconditional 1ms sleep. This is an expert user
tool, after all, and non-experts will unlikely ever use it in the non-fixed
interval mode anyway, so I think it is OK to remove the 1ms delay.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: read from pipes too

Commit '47936f944e78 tools/power turbostat: fix printing on input' make
a valid fix, but it completely disabled piped stdin support, which is
a valuable use-case. Indeed, if stdin is a pipe, turbostat won't read
anything from it, so it becomes impossible to get turbostat output at
user-defined moments, instead of the regular intervals.

There is no reason why this should works for terminals, but not for
pipes. This patch improves the situation. Instead of ignoring pipes, we
read data from them but gracefully handle the EOF case.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: Add Ice Lake NNPI support

This enables turbostat utility on Ice Lake NNPI SoC.

Link: https://lkml.org/lkml/2019/6/5/1034
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: rename has_hsw_msrs()

Perhaps if this more descriptive name had been used,
then we wouldn't have had the HSW ULT vs HSW CORE bug,
fixed by the previous commit.

Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: Fix Haswell Core systems

turbostat: cpu0: msr offset 0x630 read failed: Input/output error

because Haswell Core does not have C8-C10.

Output C8-C10 only on Haswell ULT.

Fixes: f5a4c76ad7de ("tools/power turbostat: consolidate duplicate model numbers")
Reported-by: Prarit Bhargava <[email protected]>
Suggested-by: Kosuke Tatsukawa <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: add Jacobsville support

Jacobsville behaves like Denverton.

Signed-off-by: Zhang Rui <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: fix buffer overrun

turbostat could be terminated by general protection fault on some latest
hardwares which (for example) support 9 levels of C-states and show 18
"tADDED" lines. That bloats the total output and finally causes buffer
overrun. So let's extend the buffer to avoid this.

Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: fix file descriptor leaks

Fix file descriptor leaks by closing fp before return.

Addresses-Coverity-ID: 1444591 ("Resource leak")
Addresses-Coverity-ID: 1444592 ("Resource leak")
Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Prarit Bhargava <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: fix leak of file descriptor on error return path

Currently the error return path does not close the file fp and leaks
a file descriptor. Fix this by closing the file.

Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: Make interval calculation per thread to reduce jitter

Turbostat currently normalizes TSC and other values by dividing by an
interval. This interval is the delta between the start of one global
(all counters on all CPUs) sampling and the start of another. However,
this introduces a lot of jitter into the data.

In order to reduce jitter, the interval calculation should be based on
timestamps taken per thread and close to the start of the thread's
sampling.

Define a per thread time value to hold the delta between samples taken
on the thread.

Use the timestamp taken at the beginning of sampling to calculate the
delta.

Move the thread's beginning timestamp to after the CPU migration to
avoid jitter due to the migration.

Use the global time delta for the average time delta.

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power turbostat: remove duplicate pc10 column

Remove the duplicate pc10 column.

Fixes: be0e54c4ebbf ("turbostat: Build-in "Low Power Idle" counters support")
Reported-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power x86_energy_perf_policy: Fix argument parsing

The -w argument in x86_energy_perf_policy currently triggers an
unconditional segfault.

This is because the argument string reads: "+a:c:dD:E:e:f:m:M:rt:u:vw" and
yet the argument handler expects an argument.

When parse_optarg_string is called with a null argument, we then proceed to
crash in strncmp, not horribly friendly.

The man page describes -w as taking an argument, the long form
(--hwp-window) is correctly marked as taking a required argument, and the
code expects it.

As such, this patch simply marks the short form (-w) as requiring an
argument.

Signed-off-by: Zephaniah E. Loss-Cutler-Hull <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power: Fix typo in man page

From context, we mean EPB (Enegry Performance Bias).

Signed-off-by: Matt Lupfer <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power/x86: Enable compiler optimisations and Fortify by default

Compiling without optimisations is silly, especially since some
warnings depend on the optimiser. Use -O2.

Fortify adds warnings for unchecked I/O (among other things), which
seems to be a good idea for user-space code. Enable that too.

Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Len Brown <[email protected]>

tools/power x86_energy_perf_policy: Fix "uninitialized variable" warnings at -O2

x86_energy_perf_policy first uses __get_cpuid() to check the maximum
CPUID level and exits if it is too low.  It then assumes that later
calls will succeed (which I think is architecturally guaranteed).  It
also assumes that CPUID works at all (which is not guaranteed on
x86_32).

If optimisations are enabled, gcc warns about potentially
uninitialized variables.  Fix this by adding an exit-on-error after
every call to __get_cpuid() instead of just checking the maximum
level.

Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Len Brown <[email protected]>

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"I2C has a bunch of driver fixes and a core improvement to make the
  on-going API transition more robust"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: mediatek: disable zero-length transfers for mt8183
  i2c: iproc: Stop advertising support of SMBUS quick cmd
  MAINTAINERS: i2c mv64xxx: Update documentation path
  i2c: piix4: Fix port selection for AMD Family 16h Model 30h
  i2c: designware: Synchronize IRQs when unregistering slave client
  i2c: i801: Avoid memory leak in check_acpi_smo88xx_device()
  i2c: make i2c_unregister_device() ERR_PTR safe

Merge tag 'trace-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Small fixes and minor cleanups for tracing:

   - Make exported ftrace function not static

   - Fix NULL pointer dereference in reading probes as they are created

   - Fix NULL pointer dereference in k/uprobe clean up path

   - Various documentation fixes"

* tag 'trace-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Correct kdoc formats
  ftrace/x86: Remove mcount() declaration
  tracing/probe: Fix null pointer dereference
  tracing: Make exported ftrace_set_clr_event non-static
  ftrace: Check for successful allocation of hash
  ftrace: Check for empty hash and comment the race with registering probes
  ftrace: Fix NULL pointer dereference in t_probe_next()

Merge tag 'riscv/for-v5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fix from Paul Walmsley:
"One significant fix for 32-bit RISC-V systems:

  Fix the RV32 memory map to prevent userspace from corrupting the
  FIXMAP area. Without this patch, the system can crash very early
  during the boot"

* tag 'riscv/for-v5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Fix FIXMAP area corruption on RV32 systems

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
"PPC:
   - Fix bug which could leave locks held in the host on return to a
     guest.

  x86:
   - Prevent infinitely looping emulation of a failing syscall while
     single stepping.

   - Do not crash the host when nesting is disabled"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Don't update RIP or do single-step on faulting emulation
  KVM: x86: hyper-v: don't crash on KVM_GET_SUPPORTED_HV_CPUID when kvm_intel.nested is disabled
  KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling

Merge branch 'akpm' (patches from Andrew)

Merge misc mm fixes from Andrew Morton:
"7 fixes"

* emailed patches from Andrew Morton <[email protected]>:
  mm: memcontrol: fix percpu vmstats and vmevents flush
  mm, memcg: do not set reclaim_state on soft limit reclaim
  mailmap: add aliases for Dmitry Safonov
  mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate
  mm, memcg: partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"
  mm/zsmalloc.c: fix build when CONFIG_COMPACTION=n
  mm: memcontrol: flush percpu slab vmstats on kmem offlining

tracing: Correct kdoc formats

Fix the following kdoc warnings:

kernel/trace/trace.c:1579: warning: Function parameter or member 'tr' not described in 'update_max_tr_single'
kernel/trace/trace.c:1579: warning: Function parameter or member 'tsk' not described in 'update_max_tr_single'
kernel/trace/trace.c:1579: warning: Function parameter or member 'cpu' not described in 'update_max_tr_single'
kernel/trace/trace.c:1776: warning: Function parameter or member 'type' not described in 'register_tracer'
kernel/trace/trace.c:2239: warning: Function parameter or member 'task' not described in 'tracing_record_taskinfo'
kernel/trace/trace.c:2239: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo'
kernel/trace/trace.c:2269: warning: Function parameter or member 'prev' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:2269: warning: Function parameter or member 'next' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:2269: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:3078: warning: Function parameter or member 'ip' not described in 'trace_vbprintk'
kernel/trace/trace.c:3078: warning: Function parameter or member 'fmt' not described in 'trace_vbprintk'
kernel/trace/trace.c:3078: warning: Function parameter or member 'args' not described in 'trace_vbprintk'

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

ftrace/x86: Remove mcount() declaration

Commit 562e14f72292 ("ftrace/x86: Remove mcount support") removed the
support for using mcount, so we could remove the mcount() declaration
to clean up.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jisheng Zhang <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing/probe: Fix null pointer dereference

BUG: KASAN: null-ptr-deref in trace_probe_cleanup+0x8d/0xd0
Read of size 8 at addr 0000000000000000 by task syz-executor.0/9746
trace_probe_cleanup+0x8d/0xd0
free_trace_kprobe.part.14+0x15/0x50
alloc_trace_kprobe+0x23e/0x250

Link: http://lkml.kernel.org/r/[email protected]
Fixes: e3dc9f898ef9c ("tracing/probe: Add trace_event_call accesses APIs")
Signed-off-by: Xinpeng Liu <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Make exported ftrace_set_clr_event non-static

The function ftrace_set_clr_event is declared static and marked
EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the
function was decided to be a part of API, this commit removes the static
attribute and adds the declaration to the header.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: f45d1225adb04 ("tracing: Kernel access to Ftrace instances")
Reviewed-by: Joe Jin <[email protected]>
Signed-off-by: Denis Efremov <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

udp: Remove unlikely() from IS_ERR*() condition

"unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses
unlikely() internally.

Signed-off-by: Denis Efremov <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>

net/mlx5e: Remove unlikely() from WARN*() condition

"unlikely(WARN_ON_ONCE(x))" is excessive. WARN_ON_ONCE() already uses
unlikely() internally.

Signed-off-by: Denis Efremov <[email protected]>
Cc: Boris Pismenny <[email protected]>
Cc: Saeed Mahameed <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Acked-by: Saeed Mahameed <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"Fix a potential crash in the ccp driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccp - Ignore unconfigured CCP device on suspend/resume

Partially revert "kfifo: fix kfifo_alloc() and kfifo_init()"

Commit dfe2a77fd243 ("kfifo: fix kfifo_alloc() and kfifo_init()") made
the kfifo code round the number of elements up.  That was good for
__kfifo_alloc(), but it's actually wrong for __kfifo_init().

The difference? __kfifo_alloc() will allocate the rounded-up number of
elements, but __kfifo_init() uses an allocation done by the caller.  We
can't just say "use more elements than the caller allocated", and have
to round down.

The good news? All the normal cases will be using power-of-two arrays
anyway, and most users of kfifo's don't use kfifo_init() at all, but one
of the helper macros to declare a KFIFO that enforce the proper
power-of-two behavior.  But it looks like at least ibmvscsis might be
affected.

The bad news? Will Deacon refers to an old thread and points points out
that the memory ordering in kfifo's is questionable.  See

  https://lore.kernel.org/lkml/20181211034032 [email protected]/

for more.

Fixes: dfe2a77fd243 ("kfifo: fix kfifo_alloc() and kfifo_init()")
Reported-by: laokz <[email protected]>
Cc: Stefani Seibold <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcontrol: fix percpu vmstats and vmevents flush

Instead of using raw_cpu_read() use per_cpu() to read the actual data of
the corresponding cpu otherwise we will be reading the data of the
current cpu for the number of online CPUs.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: bb65f89b7d3d ("mm: memcontrol: flush percpu vmevents before releasing memcg")
Fixes: c350a99ea2b1 ("mm: memcontrol: flush percpu vmstats before releasing memcg")
Signed-off-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, memcg: do not set reclaim_state on soft limit reclaim

Adric Blake has noticed[1] the following warning:

  WARNING: CPU: 7 PID: 175 at mm/vmscan.c:245 set_task_reclaim_state+0x1e/0x40
  [...]
  Call Trace:
   mem_cgroup_shrink_node+0x9b/0x1d0
   mem_cgroup_soft_limit_reclaim+0x10c/0x3a0
   balance_pgdat+0x276/0x540
   kswapd+0x200/0x3f0
   ? wait_woken+0x80/0x80
   kthread+0xfd/0x130
   ? balance_pgdat+0x540/0x540
   ? kthread_park+0x80/0x80
   ret_from_fork+0x35/0x40
  ---[ end trace 727343df67b2398a ]---

which tells us that soft limit reclaim is about to overwrite the
reclaim_state configured up in the call chain (kswapd in this case but
the direct reclaim is equally possible).  This means that reclaim stats
would get misleading once the soft reclaim returns and another reclaim
is done.

Fix the warning by dropping set_task_reclaim_state from the soft reclaim
which is always called with reclaim_state set up.

[1] http://lkml.kernel.org/r/CAE1jjeePxYPvw1mw2B3v803xHVR_BNnz0hQUY_JDMN8ny29M6w@mail.gmail.com

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: Adric Blake <[email protected]>
Acked-by: Yafang Shao <[email protected]>
Acked-by: Yang Shi <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Hillf Danton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mailmap: add aliases for Dmitry Safonov

I don't work for Virtuozzo or Samsung anymore and I've noticed that they
have started sending annoying html email-replies.

And I prioritize my personal emails over work email box, so while at it
add an entry for Arista too - so I can reply faster when needed.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate

Fix lock/unlock imbalance by unlocking *zhdr* before return.

Addresses Coverity ID 1452811 ("Missing unlock")

Link: http://lkml.kernel.org/r/20190826030634.GA4379@embeddedor
Fixes: d776aaa9895e ("mm/z3fold.c: fix race between migration and destruction")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Henry Burns <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, memcg: partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"

Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync
with the hierarchical ones") effectively decreased the precision of
per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters.

That's good for displaying in memory.stat, but brings a serious
regression into the reclaim process.

One issue I've discovered and debugged is the following:
lruvec_lru_size() can return 0 instead of the actual number of pages in
the lru list, preventing the kernel to reclaim last remaining pages.
Result is yet another dying memory cgroups flooding.  The opposite is
also happening: scanning an empty lru list is the waste of cpu time.

Also, inactive_list_is_low() can return incorrect values, preventing the
active lru from being scanned and freed.  It can fail both because the
size of active and inactive lists are inaccurate, and because the number
of workingset refaults isn't precise.  In other words, the result is
pretty random.

I'm not sure, if using the approximate number of slab pages in
count_shadow_number() is acceptable, but issues described above are
enough to partially revert the patch.

Let's keep per-memcg vmstat_local batched (they are only used for
displaying stats to the userspace), but keep lruvec stats precise.  This
change fixes the dead memcg flooding on my setup.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones")
Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Yafang Shao <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/zsmalloc.c: fix build when CONFIG_COMPACTION=n

Fixes: 701d678599d0c1 ("mm/zsmalloc.c: fix race condition in zs_destroy_pool")
Link: http://lkml.kernel.org/r/201908251039.5oSbEEUT%[email protected]
Reported-by: kbuild test robot <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Henry Burns <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Jonathan Adams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcontrol: flush percpu slab vmstats on kmem offlining

I've noticed that the "slab" value in memory.stat is sometimes 0, even
if some children memory cgroups have a non-zero "slab" value.  The
following investigation showed that this is the result of the kmem_cache
reparenting in combination with the per-cpu batching of slab vmstats.

At the offlining some vmstat value may leave in the percpu cache, not
being propagated upwards by the cgroup hierarchy.  It means that stats
on ancestor levels are lower than actual.  Later when slab pages are
released, the precise number of pages is substracted on the parent
level, making the value negative.  We don't show negative values, 0 is
printed instead.

To fix this issue, let's flush percpu slab memcg and lruvec stats on
memcg offlining.  This guarantees that numbers on all ancestor levels
are accurate and match the actual number of outstanding slab pages.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
Signed-off-by: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Spurious warning when loading rules using the physdev match,
from Todd Seidelmann.

2) Fix FTP conntrack helper debugging output, from Thomas Jarosch.

3) Restore per-netns nf_conntrack_{acct,helper,timeout} sysctl knobs,
from Florian Westphal.

4) Clear skbuff timestamp from the flowtable datapath, also from Florian.

5) Fix incorrect byteorder of NFT_META_BRI_IIFVPROTO, from wenxu.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2019-08-31

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix 32-bit zero-extension during constant blinding which
has been causing a regression on ppc64, from Naveen.

2) Fix a latency bug in nfp driver when updating stack index
register, from Jiong.
====================

Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Fix compile error regression with CONFIG_BNXT_SRIOV not set.

Add a new function bnxt_get_registered_vfs() to handle the work
of getting the number of registered VFs under #ifdef CONFIG_BNXT_SRIOV.
The main code will call this function and will always work correctly
whether CONFIG_BNXT_SRIOV is set or not.

Fixes: 230d1f0de754 ("bnxt_en: Handle firmware reset.")
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'Fixes-for-unlocked-cls-hardware-offload-API-refactoring'

Vlad Buslov says:

====================
Fixes for unlocked cls hardware offload API refactoring

Two fixes for my "Refactor cls hardware offload API to support
rtnl-independent drivers" series.
====================

Signed-off-by: David S. Miller <[email protected]>

net/mlx5e: Move local var definition into ifdef block

New local variable "struct flow_block_offload *f" was added to
mlx5e_setup_tc() in recent rtnl lock removal patches. The variable is used
in code that is only compiled when CONFIG_MLX5_ESWITCH is enabled. This
results compilation warning about unused variable when CONFIG_MLX5_ESWITCH
is not set. Move the variable definition into eswitch-specific code block
from the beginning of mlx5e_setup_tc() function.

Fixes: c9f14470d048 ("net: sched: add API for registering unlocked offload block callbacks")
Reported-by: tanhuazhong <[email protected]>
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: cls_matchall: cleanup flow_action before deallocating

Recent rtnl lock removal patch changed flow_action infra to require proper
cleanup besides simple memory deallocation. However, matchall classifier
was not updated to call tc_cleanup_flow_action(). Add proper cleanup to
mall_replace_hw_filter() and mall_reoffload().

Fixes: 5a6ff4b13d59 ("net: sched: take reference to action dev before calling offloads")
Reported-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp_bbr: clarify that bbr_bdp() rounds up in comments

This explicitly clarifies that bbr_bdp() returns the rounded-up value of
the bandwidth-delay product and why in the comments.

Signed-off-by: Luke Hsiao <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Acked-by: Priyaranjan Jha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sched: act_vlan: implement stats_update callback

Implement this callback in order to get the offloaded stats added to the
kernel stats.

Reported-by: Pengfei Liu <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2]

When a local endpoint is ceases to be in use, such as when the kafs module
is unloaded, the kernel will emit an assertion failure if there are any
outstanding client connections:

rxrpc: Assertion failed
------------[ cut here ]------------
kernel BUG at net/rxrpc/local_object.c:433!

and even beyond that, will evince other oopses if there are service
connections still present.

Fix this by:

(1) Removing the triggering of connection reaping when an rxrpc socket is
     released.  These don't actually clean up the connections anyway - and
     further, the local endpoint may still be in use through another
     socket.

(2) Mark the local endpoint as dead when we start the process of tearing
     it down.

(3) When destroying a local endpoint, strip all of its client connections
     from the idle list and discard the ref on each that the list was
     holding.

(4) When destroying a local endpoint, call the service connection reaper
     directly (rather than through a workqueue) to immediately kill off all
     outstanding service connections.

(5) Make the service connection reaper reap connections for which the
     local endpoint is marked dead.

Only after destroying the connections can we close the socket lest we get
an oops in a workqueue that's looking at a connection or a peer.

Fixes: 3d18cbb7fd0c ("rxrpc: Fix conn expiry timers")
Signed-off-by: David Howells <[email protected]>
Tested-by: Marc Dionne <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'rxrpc-fixes-20190827' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Fix use of skb_cow_data()

Here's a series of patches that replaces the use of skb_cow_data() in rxrpc
with skb_unshare() early on in the input process.  The problem that is
being seen is that skb_cow_data() indirectly requires that the maximum
usage count on an sk_buff be 1, and it may generate an assertion failure in
pskb_expand_head() if not.

This can occur because rxrpc_input_data() may be still holding a ref when
it has just attached the sk_buff to the rx ring and given that attachment
its own ref.  If recvmsg happens fast enough, skb_cow_data() can see the
ref still held by the softirq handler.

Further, a packet may contain multiple subpackets, each of which gets its
own attachment to the ring and its own ref - also making skb_cow_data() go
bang.

Fix this by:

(1) The DATA packet is currently parsed for subpackets twice by the input
     routines.  Parse it just once instead and make notes in the sk_buff
     private data.

(2) Use the notes from (1) when attaching the packet to the ring multiple
     times.  Once the packet is attached to the ring, recvmsg can see it
     and start modifying it, so the softirq handler is not permitted to
     look inside it from that point.

(3) Pass the ref from the input code to the ring rather than getting an
     extra ref.  rxrpc_input_data() uses a ref on the second refcount to
     prevent the packet from evaporating under it.

(4) Call skb_unshare() on secured DATA packets in rxrpc_input_packet()
     before we take call->input_lock.  Other sorts of packets don't get
     modified and so can be left.

     A trace is emitted if skb_unshare() eats the skb.  Note that
     skb_share() for our accounting in this regard as we can't see the
     parameters in the packet to log in a trace line if it releases it.

(5) Remove the calls to skb_cow_data().  These are then no longer
     necessary.

There are also patches to improve the rxrpc_skb tracepoint to make sure
that Tx-derived buffers are identified separately from Rx-derived buffers
in the trace.
====================

Signed-off-by: David S. Miller <[email protected]>

net: stmmac: depend on COMMON_CLK

Fixes: 190f73ab4c43 ("net: stmmac: setup higher frequency clk support for EHL & TGL")
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: dwmac-rk: Don't fail if phy regulator is absent

The devicetree binding lists the phy phy as optional. As such, the
driver should not bail out if it can't find a regulator. Instead it
should just skip the remaining regulator related code and continue
on normally.

Skip the remainder of phy_power_on() if a regulator supply isn't
available. This also gets rid of the bogus return code.

Fixes: 2e12f536635f ("net: stmmac: dwmac-rk: Use standard devicetree property for phy regulator")
Signed-off-by: Chen-Yu Tsai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

amd-xgbe: Fix error path in xgbe_mod_init()

In xgbe_mod_init(), we should do cleanup if some error occurs

Reported-by: Hulk Robot <[email protected]>
Fixes: efbaa828330a ("amd-xgbe: Add support to handle device renaming")
Fixes: 47f164deab22 ("amd-xgbe: Add PCI device support")
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

arcnet: capmode: remove redundant assignment to pointer pkt

Pointer pkt is being initialized with a value that is never read
and pkt is being re-assigned a little later on. The assignment is
redundant and hence can be removed.

Addresses-Coverity: ("Ununsed value")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'bnxt_en-health-and-error-recovery'

Michael Chan says:

====================
bnxt_en: health and error recovery.

This patchset implements adapter health and error recovery.  The status
is reported through several devlink reporters and the driver will
initiate and complete the recovery process using the devlink infrastructure.

v2: Added 4 patches at the beginning of the patchset to clean up error code
    handling related to firmware messages and to convert to use standard
    error codes.

    Removed the dropping of rtnl_lock in bnxt_close().

    Broke up the patches some more for better patch organization and
    future bisection.
====================

Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add FW fatal devlink_health_reporter.

Health show command example and output:

$ devlink health show pci/0000:af:00.0 reporter fw_fatal

pci/0000:af:00.0:
name fw_fatal
state healthy error 1 recover 1 grace_period 0 auto_recover true

Fatal events from firmware or missing periodic heartbeats will
be reported and recovery will be handled.

We also turn on the support flags when we register with the firmware to
enable this health and recovery feature in the firmware.

Cc: Jiri Pirko <[email protected]>
Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add bnxt_fw_exception() to handle fatal firmware errors.

This call will handle fatal firmware errors by forcing a reset on the
firmware. The master function driver will carry out the forced reset.
The sequence will go through the same bnxt_fw_reset_task() workqueue.
This fatal reset differs from the non-fatal reset at the beginning
stages. From the BNXT_FW_RESET_STATE_ENABLE_DEV state onwards where
the firmware is coming out of reset, it is practically identical to the
non-fatal reset.

The next patch will add the periodic heartbeat check and the devlink
reporter to report the fatal event and to initiate the bnxt_fw_exception()
call.

Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task().

This state handles driver initiated chip reset during error recovery.
Only the master function will perform this step during error recovery.
The next patch will add code to initiate this reset from the master
function.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Do not send firmware messages if firmware is in error state.

Add a flag to mark that the firmware has encountered fatal condition.
The driver will not send any more firmware messages and will return
error to the caller. Fix up some clean up functions to continue
and not abort when the firmware message function returns error.

This is preparation work to fully handle firmware error recovery
under fatal conditions.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Retain user settings on a VF after RESET_NOTIFY event.

Retain the VF MAC address, default VLAN, TX rate control, trust settings
of VFs after firmware reset.

Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add devlink health reset reporter.

Add devlink health reporter for the firmware reset event. Once we get
the notification from firmware about the impending reset, the driver
will report this to devlink and the call to bnxt_fw_reset() will be
initiated to complete the reset sequence.

Cc: Jiri Pirko <[email protected]>
Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Handle firmware reset.

Add the bnxt_fw_reset() main function to handle firmware reset. This
is triggered by firmware to initiate an orderly reset, for example
when a non-fatal exception condition has been detected. bnxt_fw_reset()
will first wait for all VFs to shutdown and then start the
bnxt_fw_reset_task() work queue to go through the sequence of reset,
re-probe, and re-initialization.

The next patch will add the devlink reporter to start the sequence and
call bnxt_fw_reset().

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Handle RESET_NOTIFY async event from firmware.

This event from firmware signals a coordinated reset initiated by the
firmware. It may be triggered by some error conditions encountered
in the firmware or other orderly reset conditions.

We store the parameters from this event. Subsequent patches will
add logic to handle reset itself using devlink reporters.

Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add new FW devlink_health_reporter

Create new FW devlink_health_reporter, to know the current health
status of FW.

Command example and output:
$ devlink health show pci/0000:af:00.0 reporter fw

pci/0000:af:00.0:
name fw
state healthy error 0 recover 0

FW status: Healthy; Reset count: 1

Cc: Jiri Pirko <[email protected]>
Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add BNXT_STATE_IN_FW_RESET state.

The new flag will be set in subsequent patches when firmware is
going through reset. If bnxt_close() is called while the new flag
is set, the FW reset sequence will have to be aborted because the
NIC is prematurely closed before FW reset has completed. We also
reject SRIOV configurations while FW reset is in progress.

v2: No longer drop rtnl_lock() in close and wait for FW reset to complete.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Enable health monitoring.

Handle the async event from the firmware that enables firmware health
monitoring. Store initial health metrics.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Pre-map the firmware health monitoring registers.

Pre-map the GRC registers for periodic firmware health monitoring.

Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Discover firmware error recovery capabilities.

Call the new firmware API HWRM_ERROR_RECOVERY_QCFG if it is supported
to discover the firmware health and recovery capabilities and settings.
This feature allows the driver to reset the chip if firmware crashes and
becomes unresponsive.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Handle firmware reset status during IF_UP.

During IF_UP, newer firmware has a new status flag that indicates that
firmware has reset. Add new function bnxt_fw_init_one() to re-probe the
firmware and re-setup VF resources on the PF if necessary. If the
re-probe fails, set a flag to prevent bnxt_open() from proceeding again.

Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Register buffers for VFs before reserving resources.

When VFs need to be reconfigured dynamically after firmwware reset, the
configuration sequence on the PF needs to be changed to register the VF
buffers first. Otherwise, some VF firmware commands may not succeed as
there may not be PF buffers ready for the re-directed firmware commands.

This sequencing did not matter much before when we only supported
the normal bring-up of VFs.

Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Refactor bnxt_sriov_enable().

Refactor the hardware/firmware configuration portion in
bnxt_sriov_enable() into a new function bnxt_cfg_hw_sriov(). This
new function can be called after a firmware reset to reconfigure the
VFs previously enabled.

v2: straight refactor of the code. Reordering done in the next patch.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Prepare bnxt_init_one() to be called multiple times.

In preparation for the new firmware reset feature, some of the logic
in bnxt_init_one() and related functions will be called again after
firmware has reset.  Reset some of the flags and capabilities so that
everything that can change can be re-initialized.  Refactor some
functions to probe firmware versions and capabilities.  Check some
buffers before allocating as they may have been allocated previously.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode.

If the silent parameter is set, suppress all messages when there is
no response from firmware.  When polling for firmware to come out of
reset, no response may be normal and we want to suppress the error
messages.  Also, don't poll for the firmware DMA response if Bus Master
is disabled.  This is in preparation for error recovery when firmware
may be in error or reset state or Bus Master is disabled.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Simplify error checking in the SR-IOV message forwarding functions.

There are 4 functions handling message forwarding for SR-IOV.  They
check for non-zero firmware response code and then return -1.  There
is no need to do this anymore.  The main messaging function will
now return standard error code.  Since we don't need to examine the
response, we can use the hwrm_send_message() variant which will
take the mutex automatically.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Convert error code in firmware message response to standard code.

The main firmware messaging function returns the firmware defined error
code and many callers have to convert to standard error code for proper
propagation to userspace. Convert bnxt_hwrm_do_send_msg() to return
standard error code so we can do away with all the special error code
handling by the many callers.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Remove the -1 error return code from bnxt_hwrm_do_send_msg().

Replace the non-standard -1 code with -EBUSY when there is no firmware
response after waiting for the maximum timeout.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Use a common function to print the same ethtool -f error message.

The same message is printed 3 times in the code, so use a common function
to do that.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ioc3-eth-improvements'

Thomas Bogendoerfer says:

====================
ioc3-eth improvements

In my patch series for splitting out the serial code from ioc3-eth
by using a MFD device there was one big patch for ioc3-eth.c,
which wasn't really usefull for reviews. This series contains the
ioc3-eth changes splitted in smaller steps and few more cleanups.
Only the conversion to MFD will be done later in a different series.

Changes in v3:
- no need to check skb == NULL before passing it to dev_kfree_skb_any
- free memory allocated with get_page(s) with free_page(s)
- allocate rx ring with just GFP_KERNEL
- add required alignment for rings in comments

Changes in v2:
- use net_err_ratelimited for printing various ioc3 errors
- added missing clearing of rx buf valid flags into ioc3_alloc_rings
- use __func__ for printing out of memory messages
====================

Signed-off-by: David S. Miller <[email protected]>

net: sgi: ioc3-eth: no need to stop queue set_multicast_list

netif_stop_queue()/netif_wake_qeue() aren't needed for changing
multicast filters.

Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sgi: ioc3-eth: protect emcr in all cases

emcr in private struct wasn't always protected by spinlock.

Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sgi: ioc3-eth: Fix IPG settings

The half/full duplex settings for inter packet gap counters/timer were
reversed.

Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sgi: ioc3-eth: use csum_fold

replace open coded checksum folding by csum_fold.

Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sgi: ioc3-eth: use dma-direct for dma allocations

Replace the homegrown DMA memory allocation, which only works on
SGI-IP27 machines, with the generic dma allocations.

Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sgi: ioc3-eth: refactor rx buffer allocation

Move common code for rx buffer setup into ioc3_alloc_skb and deal
with allocation failures. Also clean up allocation size calculation.

Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sgi: ioc3-eth: split ring cleaning/freeing and allocation

Do tx ring cleaning and freeing of rx buffers, when chip is shutdown and
allocate buffers before bringing chip up.

Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sgi: ioc3-eth: introduce chip start function

ioc3_init did everything from reset to init rings to starting the chip.
This change move out chip start into a new function as preparation
for easier handling of receive buffer allocation failures.

Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sgi: ioc3-eth: separate tx and rx ring handling

After allocation of descriptor memory is now done once in probe
handling of tx ring is completely done by ioc3_clean_tx_ring. So
we remove the remaining tx ring actions out of ioc3_alloc_rings
and ioc3_free_rings and rename it to ioc3_[alloc|free]_rx_bufs
to better describe what they are doing.

Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sgi: ioc3-eth: get rid of ioc3_clean_rx_ring()

Move clearing of the descriptor valid bit into ioc3_alloc_rings. This
makes ioc3_clean_rx_ring obsolete.

Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>