Git Repo - linux.git/log

um: Implement copy_thread_tls

This is required for clone3 which passes the TLS value through a
struct rather than a register.

Signed-off-by: Amanieu d'Antras <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 5.3.x
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

clone3: ensure copy_thread_tls is implemented

copy_thread implementations handle CLONE_SETTLS by reading the TLS
value from the registers containing the syscall arguments for
clone. This doesn't work with clone3 since the TLS value is passed
in clone_args instead.

Signed-off-by: Amanieu d'Antras <[email protected]>
Cc: <[email protected]> # 5.3.x
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

xtensa: Implement copy_thread_tls

This is required for clone3 which passes the TLS value through a
struct rather than a register.

Signed-off-by: Amanieu d'Antras <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 5.3.x
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

riscv: Implement copy_thread_tls

This is required for clone3 which passes the TLS value through a
struct rather than a register.

Signed-off-by: Amanieu d'Antras <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 5.3.x
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

parisc: Implement copy_thread_tls

This is required for clone3 which passes the TLS value through a
struct rather than a register.

Signed-off-by: Amanieu d'Antras <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 5.3.x
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

arm: Implement copy_thread_tls

This is required for clone3 which passes the TLS value through a
struct rather than a register.

Signed-off-by: Amanieu d'Antras <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 5.3.x
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

arm64: Implement copy_thread_tls

This is required for clone3 which passes the TLS value through a
struct rather than a register.

Signed-off-by: Amanieu d'Antras <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 5.3.x
Acked-by: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

arm64: Move __ARCH_WANT_SYS_CLONE3 definition to uapi headers

Previously this was only defined in the internal headers which
resulted in __NR_clone3 not being defined in the user headers.

Signed-off-by: Amanieu d'Antras <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 5.3.x
Reviewed-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

gpiolib: acpi: Add honor_wakeup module-option + quirk mechanism

On some laptops enabling wakeup on the GPIO interrupts used for ACPI _AEI
event handling causes spurious wakeups.

This commit adds a new honor_wakeup option, defaulting to true (our current
behavior), which can be used to disable wakeup on troublesome hardware
to avoid these spurious wakeups.

This is a workaround for an architectural problem with s2idle under Linux
where we do not have any mechanism to immediately go back to sleep after
wakeup events, other then for embedded-controller events using the standard
ACPI EC interface, for details see:
https://lore.kernel.org/linux-acpi/61450f9b-cbc6-0c09-8b3a-aff6bf9a0b3c@redhat.com/

One series of laptops which is not able to suspend without this workaround
is the HP x2 10 Cherry Trail models, this commit adds a DMI based quirk
which makes sets honor_wakeup to false on these models.

Cc: [email protected]
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

gpiolib: acpi: Turn dmi_system_id table into a generic quirk table

Turn the existing run_edge_events_on_boot_blacklist dmi_system_id table
into a generic quirk table, storing the quirks in the driver_data ptr.

This is a preparation patch for adding other types of (DMI based) quirks.

Cc: [email protected]
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

powercap: intel_rapl: add NULL pointer check to rapl_mmio_cpu_online()

RAPL MMIO support depends on the RAPL common driver.  During CPU
initialization rapl_mmio_cpu_online() is called via CPU hotplug
to initialize the MMIO RAPL for the new CPU, but if that CPU is
not present in the common RAPL driver's support list, rapl_defaults
is NULL and the kernel crashes on an attempt to dereference it:

[    4.188566] BUG: kernel NULL pointer dereference, address: 0000000000000020
...snip...
[    4.189555] RIP: 0010:rapl_add_package+0x223/0x574
[    4.189555] Code: b5 a0 31 c0 49 8b 4d 78 48 01 d9 48 8b 0c c1 49 89 4c c6 10 48 ff c0 48 83 f8 05 75 e7 49 83 ff 03 75 15 48 8b 05 09 bc 18 01 <8b> 70 20 41 89 b6 0c 05 00 00 85 f6 75 1a 49 81 c6 18 9
[    4.189555] RSP: 0000:ffffb3adc00b3d90 EFLAGS: 00010246
[    4.189555] RAX: 0000000000000000 RBX: 0000000000000098 RCX: 0000000000000000
[    4.267161] usb 1-1: New USB device found, idVendor=2109, idProduct=2812, bcdDevice= b.e0
[    4.189555] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff9340caafd000
[    4.189555] RBP: ffffb3adc00b3df8 R08: ffffffffa0246e28 R09: ffff9340caafc000
[    4.189555] R10: 000000000000024a R11: ffffffff9ff1f6f2 R12: 00000000ffffffed
[    4.189555] R13: ffff9340caa94800 R14: ffff9340caafc518 R15: 0000000000000003
[    4.189555] FS:  0000000000000000(0000) GS:ffff9340ce200000(0000) knlGS:0000000000000000
[    4.189555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.189555] CR2: 0000000000000020 CR3: 0000000302c14001 CR4: 00000000003606f0
[    4.189555] Call Trace:
[    4.189555]  ? __switch_to_asm+0x40/0x70
[    4.189555]  rapl_mmio_cpu_online+0x47/0x64
[    4.189555]  ? rapl_mmio_write_raw+0x33/0x33
[    4.281059] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    4.189555]  cpuhp_invoke_callback+0x29f/0x66f
[    4.189555]  ? __schedule+0x46d/0x6a0
[    4.189555]  cpuhp_thread_fun+0xb9/0x11c
[    4.189555]  smpboot_thread_fn+0x17d/0x22f
[    4.297006] usb 1-1: Product: USB2.0 Hub
[    4.189555]  ? cpu_report_death+0x43/0x43
[    4.189555]  kthread+0x137/0x13f
[    4.189555]  ? cpu_report_death+0x43/0x43
[    4.189555]  ? kthread_blkcg+0x2e/0x2e
[    4.312951] usb 1-1: Manufacturer: VIA Labs, Inc.
[    4.189555]  ret_from_fork+0x1f/0x40
[    4.189555] Modules linked in:
[    4.189555] CR2: 0000000000000020
[    4.189555] ---[ end trace 01bb812aabc791f4 ]---

To avoid that problem, check rapl_defaults NULL upfront and return an
error code if it is NULL.  [Note that it does not make sense to even
try to allocate memory in that case, because it is not going to be
used anyway.]

Fixes: 555c45fe0d04 ("int340X/processor_thermal_device: add support for MMIO RAPL")
Cc: 5.3+ <[email protected]> # 5.3+
Signed-off-by: Harry Pan <[email protected]>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

drm/i915: Limit audio CDCLK>=2*BCLK constraint back to GLK only

Revert changes done in commit f6ec9483091f ("drm/i915: extend audio
CDCLK>=2*BCLK constraint to more platforms"). Audio drivers
communicate with i915 over HDA bus multiple times during system
boot-up and each of these transactions result in matching
get_power/put_power calls to i915, and depending on the platform,
a modeset change causing visible flicker.

GLK is the only platform with minimum CDCLK significantly lower
than BCLK, and thus for GLK setting a higher CDCLK is mandatory.

For other platforms, minimum CDCLK is close but below 2*BCLK
(e.g. on ICL, CDCLK=176.4kHz with BCLK=96kHz). Spec-wise the constraint
should be set, but in practise no communication errors have been
reported and the downside if set is the flicker observed at boot-time.

Revert to old behaviour until better mechanism to manage
probe-time clocks is available.

The full CDCLK>=2*BCLK constraint is still enforced at pipe
enable time in intel_crtc_compute_min_cdclk().

Bugzilla: https://gitlab.freedesktop.org/drm/intel/issues/913
Fixes: f6ec9483091f ("drm/i915: extend audio CDCLK>=2*BCLK constraint to more platforms")
Signed-off-by: Kai Vehmanen <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 1ee48a61aa57dbdbc3cd2808d8b28df40d938e44)
Signed-off-by: Joonas Lahtinen <[email protected]>

drm/i915/gt: Mark up virtual engine uabi_instance

Be sure to initialise the uabi_instance on the virtual engine to the
special invalid value, just in case we ever peek at it from the uAPI.

Reported-by: Tvrtko Ursulin <[email protected]>
Fixes: 750e76b4f9f6 ("drm/i915/gt: Move the [class][inst] lookup for engines onto the GT")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: <[email protected]> # v5.4+
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit f75fc37b5e70b75f21550410f88e2379648120e2)
Signed-off-by: Joonas Lahtinen <[email protected]>

gpio: zynq: Fix for bug in zynq_gpio_restore_context API

This patch writes the inverse value of Interrupt Mask Status
register into the Interrupt Enable register in
zynq_gpio_restore_context API to fix the bug.

Fixes: e11de4de28c0 ("gpio: zynq: Add support for suspend resume")
Signed-off-by: Swapna Manupati <[email protected]>
Signed-off-by: Michal Simek <[email protected]>
Signed-off-by: Srinivas Neeli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

pinctrl: meson: Fix wrong shift value when get drive-strength

In meson_pinconf_get_drive_strength, variable bit is calculated by
meson_calc_reg_and_bit, this value is the offset from the first pin of a
certain bank to current pin, while Meson SoCs use two bits for each pin
to depict drive-strength. So a left shift by 1 should be done or node
pinconf-pins shows wrong message.

Fixes: 6ea3e3bbef37 ("pinctrl: meson: add support of drive-strength-microamp")
Signed-off-by: Qianggui Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

pinctrl: lochnagar: select GPIOLIB

In a rare randconfig build I came across one configuration that does
not enable CONFIG_GPIOLIB, which is needed by lochnagar:

ERROR: "devm_gpiochip_add_data" [drivers/pinctrl/cirrus/pinctrl-lochnagar.ko] undefined!
ERROR: "gpiochip_generic_free" [drivers/pinctrl/cirrus/pinctrl-lochnagar.ko] undefined!
ERROR: "gpiochip_generic_request" [drivers/pinctrl/cirrus/pinctrl-lochnagar.ko] undefined!
ERROR: "gpiochip_get_data" [drivers/pinctrl/cirrus/pinctrl-lochnagar.ko] undefined!

Add another 'select' like all other pinctrl drivers have.

Fixes: 0548448b719a ("pinctrl: lochnagar: Add support for the Cirrus Logic Lochnagar")
Signed-off-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Charles Keepax <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>

Merge branch 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull cpufreq driver fix for v5.5-rc6 from Viresh Kumar:

"Blacklist Tegra20/30 for probing by cpufreq-dt driver."

* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: dt-platdev: Blacklist NVIDIA Tegra20 and Tegra30 SoCs

drivers: thermal: tsens: Work with old DTBs

In order for the old DTBs to continue working, the new interrupt code
must not return an error if interrupts are not defined. Don't return an
error in case of -ENXIO.

Fixes: 634e11d5b450a ("drivers: thermal: tsens: Add interrupt support")
Suggested-by: Stephan Gerhold <[email protected]>
Signed-off-by: Amit Kucheria <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
Tested-by: Bjorn Andersson <[email protected]>
Signed-off-by: Daniel Lezcano <[email protected]>
Link: https://lore.kernel.org/r/cea3317c5d793db312064d68b261ad420a4a81b1.1576146898.git.amit.kucheria@linaro.org

Merge tag 'mlx5-fixes-2020-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2020-01-06

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v5.3
('net/mlx5: Move devlink registration before interfaces load')

For -stable v5.4
('net/mlx5e: Fix hairpin RSS table size')
('net/mlx5: DR, Init lists that are used in rule's member')
('net/mlx5e: Always print health reporter message to dmesg')
('net/mlx5: DR, No need for atomic refcount for internal SW steering resources')
====================

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'trace-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Various tracing fixes:

   - kbuild found missing define of MCOUNT_INSN_SIZE for various build
     configs

   - Initialize variable to zero as gcc thinks it is used undefined (it
     really isn't but the code is subtle enough that this doesn't hurt)

   - Convert from do_div() to div64_ull() to prevent potential divide by
     zero

   - Unregister a trace point on error path in sched_wakeup tracer

   - Use signed offset for archs that can have stext not be first

   - A simple indentation fix (whitespace error)"

* tag 'trace-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix indentation issue
  kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
  tracing: Change offset type to s32 in preempt/irq tracepoints
  ftrace: Avoid potential division by zero in function profiler
  tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined
  tracing: Define MCOUNT_INSN_SIZE when not defined without direct calls
  tracing: Initialize val to zero in parse_entry of inject code

net/mlx5: DR, Init lists that are used in rule's member

Whenever adding new member of rule object we attach it to 2 lists,
These 2 lists should be initialized first.

Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality")
Signed-off-by: Erez Shitrit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Fix hairpin RSS table size

Set hairpin table size to the corret size, based on the groups that
would be created in it. Groups are laid out on the table such that a
group occupies a range of entries in the table. This implies that the
group ranges should have correspondence to the table they are laid upon.

The patch cited below made group 1's size to grow hence causing
overflow of group range laid on the table.

Fixes: a795d8db2a6d ("net/mlx5e: Support RSS for IP-in-IP and IPv6 tunneled packets")
Signed-off-by: Eli Cohen <[email protected]>
Signed-off-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: DR, No need for atomic refcount for internal SW steering resources

No need for an atomic refcounter for the STE and hashtables.
These are internal SW steering resources and they are always
under domain mutex.

This also fixes the following refcount error:
  refcount_t: addition on 0; use-after-free.
  WARNING: CPU: 9 PID: 3527 at lib/refcount.c:25 refcount_warn_saturate+0x81/0xe0
  Call Trace:
   dr_table_init_nic+0x10d/0x110 [mlx5_core]
   mlx5dr_table_create+0xb4/0x230 [mlx5_core]
   mlx5_cmd_dr_create_flow_table+0x39/0x120 [mlx5_core]
   __mlx5_create_flow_table+0x221/0x5f0 [mlx5_core]
   esw_create_offloads_fdb_tables+0x180/0x5a0 [mlx5_core]
   ...

Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Reviewed-by: Alex Vesker <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

Revert "net/mlx5: Support lockless FTE read lookups"

This reverts commit 7dee607ed0e04500459db53001d8e02f8831f084.

During cleanup path, FTE's parent node group is removed which is
referenced by the FTE while freeing the FTE.
Hence FTE's lockless read lookup optimization done in cited commit is
not possible at the moment.

Hence, revert the commit.

This avoid below KAZAN call trace.

[  110.390896] BUG: KASAN: use-after-free in find_root.isra.14+0x56/0x60
[mlx5_core]
[  110.391048] Read of size 4 at addr ffff888c19e6d220 by task
swapper/12/0

[  110.391219] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 5.5.0-rc1+
[  110.391222] Hardware name: HP ProLiant DL380p Gen8, BIOS P70
08/02/2014
[  110.391225] Call Trace:
[  110.391229]  <IRQ>
[  110.391246]  dump_stack+0x95/0xd5
[  110.391307]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
[  110.391320]  print_address_description.constprop.5+0x20/0x320
[  110.391379]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
[  110.391435]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
[  110.391441]  __kasan_report+0x149/0x18c
[  110.391499]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
[  110.391504]  kasan_report+0x12/0x20
[  110.391511]  __asan_report_load4_noabort+0x14/0x20
[  110.391567]  find_root.isra.14+0x56/0x60 [mlx5_core]
[  110.391625]  del_sw_fte_rcu+0x4a/0x100 [mlx5_core]
[  110.391633]  rcu_core+0x404/0x1950
[  110.391640]  ? rcu_accelerate_cbs_unlocked+0x100/0x100
[  110.391649]  ? run_rebalance_domains+0x201/0x280
[  110.391654]  rcu_core_si+0xe/0x10
[  110.391661]  __do_softirq+0x181/0x66c
[  110.391670]  irq_exit+0x12c/0x150
[  110.391675]  smp_apic_timer_interrupt+0xf0/0x370
[  110.391681]  apic_timer_interrupt+0xf/0x20
[  110.391684]  </IRQ>
[  110.391695] RIP: 0010:cpuidle_enter_state+0xfa/0xba0
[  110.391703] Code: 3d c3 9b b5 50 e8 56 75 6e fe 48 89 45 c8 0f 1f 44
00 00 31 ff e8 a6 94 6e fe 45 84 ff 0f 85 f6 02 00 00 fb 66 0f 1f 44 00
00 <45> 85 f6 0f 88 db 06 00 00 4d 63 fe 4b 8d 04 7f 49 8d 04 87 49 8d
[  110.391706] RSP: 0018:ffff888c23a6fce8 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff13
[  110.391712] RAX: dffffc0000000000 RBX: ffffe8ffff7002f8 RCX:
000000000000001f
[  110.391715] RDX: 1ffff11184ee6cb5 RSI: 0000000040277d83 RDI:
ffff888c277365a8
[  110.391718] RBP: ffff888c23a6fd40 R08: 0000000000000002 R09:
0000000000035280
[  110.391721] R10: ffff888c23a6fc80 R11: ffffed11847485d0 R12:
ffffffffb1017740
[  110.391723] R13: 0000000000000003 R14: 0000000000000003 R15:
0000000000000000
[  110.391732]  ? cpuidle_enter_state+0xea/0xba0
[  110.391738]  cpuidle_enter+0x4f/0xa0
[  110.391747]  call_cpuidle+0x6d/0xc0
[  110.391752]  do_idle+0x360/0x430
[  110.391758]  ? arch_cpu_idle_exit+0x40/0x40
[  110.391765]  ? complete+0x67/0x80
[  110.391771]  cpu_startup_entry+0x1d/0x20
[  110.391779]  start_secondary+0x2f3/0x3c0
[  110.391784]  ? set_cpu_sibling_map+0x2500/0x2500
[  110.391795]  secondary_startup_64+0xa4/0xb0

[  110.391841] Allocated by task 290:
[  110.391917]  save_stack+0x21/0x90
[  110.391921]  __kasan_kmalloc.constprop.8+0xa7/0xd0
[  110.391925]  kasan_kmalloc+0x9/0x10
[  110.391929]  kmem_cache_alloc_trace+0xf6/0x270
[  110.391987]  create_root_ns.isra.36+0x58/0x260 [mlx5_core]
[  110.392044]  mlx5_init_fs+0x5fd/0x1ee0 [mlx5_core]
[  110.392092]  mlx5_load_one+0xc7a/0x3860 [mlx5_core]
[  110.392139]  init_one+0x6ff/0xf90 [mlx5_core]
[  110.392145]  local_pci_probe+0xde/0x190
[  110.392150]  work_for_cpu_fn+0x56/0xa0
[  110.392153]  process_one_work+0x678/0x1140
[  110.392157]  worker_thread+0x573/0xba0
[  110.392162]  kthread+0x341/0x400
[  110.392166]  ret_from_fork+0x1f/0x40

[  110.392218] Freed by task 2742:
[  110.392288]  save_stack+0x21/0x90
[  110.392292]  __kasan_slab_free+0x137/0x190
[  110.392296]  kasan_slab_free+0xe/0x10
[  110.392299]  kfree+0x94/0x250
[  110.392357]  tree_put_node+0x257/0x360 [mlx5_core]
[  110.392413]  tree_remove_node+0x63/0xb0 [mlx5_core]
[  110.392469]  clean_tree+0x199/0x240 [mlx5_core]
[  110.392525]  mlx5_cleanup_fs+0x76/0x580 [mlx5_core]
[  110.392572]  mlx5_unload+0x22/0xc0 [mlx5_core]
[  110.392619]  mlx5_unload_one+0x99/0x260 [mlx5_core]
[  110.392666]  remove_one+0x61/0x160 [mlx5_core]
[  110.392671]  pci_device_remove+0x10b/0x2c0
[  110.392677]  device_release_driver_internal+0x1e4/0x490
[  110.392681]  device_driver_detach+0x36/0x40
[  110.392685]  unbind_store+0x147/0x200
[  110.392688]  drv_attr_store+0x6f/0xb0
[  110.392693]  sysfs_kf_write+0x127/0x1d0
[  110.392697]  kernfs_fop_write+0x296/0x420
[  110.392702]  __vfs_write+0x66/0x110
[  110.392707]  vfs_write+0x1a0/0x500
[  110.392711]  ksys_write+0x164/0x250
[  110.392715]  __x64_sys_write+0x73/0xb0
[  110.392720]  do_syscall_64+0x9f/0x3a0
[  110.392725]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 7dee607ed0e0 ("net/mlx5: Support lockless FTE read lookups")
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Move devlink registration before interfaces load

Register devlink before interfaces are added.
This will allow interfaces to use devlink while initalizing. For example,
call mlx5_is_roce_enabled.

Fixes: aba25279c100 ("net/mlx5e: Add TX reporter support")
Signed-off-by: Michael Guralnik <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Always print health reporter message to dmesg

In case a reporter exists, error message is logged only to the devlink
tracer. The devlink tracer is a visibility utility only, which user can
choose not to monitor.
After cited patch, 3rd party monitoring tools that tracks these error
message will no longer find them in dmesg, causing a regression.

With this patch, error messages are also logged into the dmesg.

Fixes: c50de4af1d63 ("net/mlx5e: Generalize tx reporter's functionality")
Signed-off-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Avoid duplicating rule destinations

Following scenario easily break driver logic and crash the kernel:
1. Add rule with mirred actions to same device.
2. Delete this rule.
In described scenario rule is not added to database and on deletion
driver access invalid entry.
Example:

$ tc filter add dev ens1f0_0 ingress protocol ip prio 1 \
       flower skip_sw \
       action mirred egress mirror dev ens1f0_1 pipe \
       action mirred egress redirect dev ens1f0_1
$ tc filter del dev ens1f0_0 ingress protocol ip prio 1

Dmesg output:

[  376.634396] mlx5_core 0000:82:00.0: mlx5_cmd_check:756:(pid 3439): DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x563e2f)
[  376.654983] mlx5_core 0000:82:00.0: del_hw_flow_group:567:(pid 3439): flow steering can't destroy fg 89 of ft 3145728
[  376.673433] kasan: CONFIG_KASAN_INLINE enabled
[  376.683769] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  376.695229] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
[  376.705069] CPU: 7 PID: 3439 Comm: tc Not tainted 5.4.0-rc5+ #76
[  376.714959] Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0a 08/12/2016
[  376.726371] RIP: 0010:mlx5_del_flow_rules+0x105/0x960 [mlx5_core]
[  376.735817] Code: 01 00 00 00 48 83 eb 08 e8 28 d9 ff ff 4c 39 e3 75 d8 4c 8d bd c0 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 84 04 00 00 48 8d 7d 28 8b 9 d
[  376.761261] RSP: 0018:ffff888847c56db8 EFLAGS: 00010202
[  376.770054] RAX: dffffc0000000000 RBX: ffff8888582a6da0 RCX: ffff888847c56d60
[  376.780743] RDX: 0000000000000058 RSI: 0000000000000008 RDI: 0000000000000282
[  376.791328] RBP: 0000000000000000 R08: fffffbfff0c60ea6 R09: fffffbfff0c60ea6
[  376.802050] R10: fffffbfff0c60ea5 R11: ffffffff8630752f R12: ffff8888582a6da0
[  376.812798] R13: dffffc0000000000 R14: ffff8888582a6da0 R15: 00000000000002c0
[  376.823445] FS:  00007f675f9a8840(0000) GS:ffff88886d200000(0000) knlGS:0000000000000000
[  376.834971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  376.844179] CR2: 00000000007d9640 CR3: 00000007d3f26003 CR4: 00000000001606e0
[  376.854843] Call Trace:
[  376.868542]  __mlx5_eswitch_del_rule+0x49/0x300 [mlx5_core]
[  376.877735]  mlx5e_tc_del_fdb_flow+0x6ec/0x9e0 [mlx5_core]
[  376.921549]  mlx5e_flow_put+0x2b/0x50 [mlx5_core]
[  376.929813]  mlx5e_delete_flower+0x5b6/0xbd0 [mlx5_core]
[  376.973030]  tc_setup_cb_reoffload+0x29/0xc0
[  376.980619]  fl_reoffload+0x50a/0x770 [cls_flower]
[  377.015087]  tcf_block_playback_offloads+0xbd/0x250
[  377.033400]  tcf_block_setup+0x1b2/0xc60
[  377.057247]  tcf_block_offload_cmd+0x195/0x240
[  377.098826]  tcf_block_offload_unbind+0xe7/0x180
[  377.107056]  __tcf_block_put+0xe5/0x400
[  377.114528]  ingress_destroy+0x3d/0x60 [sch_ingress]
[  377.122894]  qdisc_destroy+0xf1/0x5a0
[  377.129993]  qdisc_graft+0xa3d/0xe50
[  377.151227]  tc_get_qdisc+0x48e/0xa20
[  377.165167]  rtnetlink_rcv_msg+0x35d/0x8d0
[  377.199528]  netlink_rcv_skb+0x11e/0x340
[  377.219638]  netlink_unicast+0x408/0x5b0
[  377.239913]  netlink_sendmsg+0x71b/0xb30
[  377.267505]  sock_sendmsg+0xb1/0xf0
[  377.273801]  ___sys_sendmsg+0x635/0x900
[  377.312784]  __sys_sendmsg+0xd3/0x170
[  377.338693]  do_syscall_64+0x95/0x460
[  377.344833]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  377.352321] RIP: 0033:0x7f675e58e090

To avoid this, for every mirred action check if output device was
already processed. If so - drop rule with EOPNOTSUPP error.

Signed-off-by: Dmytro Linkin <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

gpio: max77620: Add missing dependency on GPIOLIB_IRQCHIP

Driver fails to compile in a minimized kernel's configuration because of
the missing dependency on GPIOLIB_IRQCHIP.

error: ‘struct gpio_chip’ has no member named ‘irq’
44 | virq = irq_find_mapping(gpio->gpio_chip.irq.domain, offset);

Signed-off-by: Dmitry Osipenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

Merge tag 'tpmdd-next-20200106' of git://git.infradead.org/users/jjs/linux-tpmdd

Pull tpmd fixes from Jarkko Sakkinen:
"There has been a bunch of reports (e.g. [*]) reporting that when
  commit 5b359c7c4372 ("tpm_tis_core: Turn on the TPM before probing
  IRQ's") and subsequent fixes are applied it causes boot freezes on
  some machines.

  Unfortunately hardware where this causes a failure is not widely
  available (only one I'm aware is Lenovo T490), which means we cannot
  predict yet how long it will take to properly fix tpm_tis interrupt
  probing.

  Thus, the least worst short term action is to revert the code to the
  state before this commit. In long term we need fix the tpm_tis probing
  code to work on machines that Stefan's patches were supposed to fix.

  With these patches reverted nothing fatal happens, TPM is fallbacked
  to be used in polling mode (which is not in the end too bad because
  there are no high throughput workloads for TPM).

  [*] https://bugzilla.kernel.org/show_bug.cgi?id=205935"

* tag 'tpmdd-next-20200106' of git://git.infradead.org/users/jjs/linux-tpmdd:
  tpm: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's"
  tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"
  tpm: Revert "tpm_tis: reserve chip for duration of tpm_tis_core_init"

bpf: Fix passing modified ctx to ld/abs/ind instruction

Anatoly has been fuzzing with kBdysch harness and reported a KASAN
slab oob in one of the outcomes:

  [...]
  [   77.359642] BUG: KASAN: slab-out-of-bounds in bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.360463] Read of size 4 at addr ffff8880679bac68 by task bpf/406
  [   77.361119]
  [   77.361289] CPU: 2 PID: 406 Comm: bpf Not tainted 5.5.0-rc2-xfstests-00157-g2187f215eba #1
  [   77.362134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  [   77.362984] Call Trace:
  [   77.363249]  dump_stack+0x97/0xe0
  [   77.363603]  print_address_description.constprop.0+0x1d/0x220
  [   77.364251]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365030]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365860]  __kasan_report.cold+0x37/0x7b
  [   77.366365]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.366940]  kasan_report+0xe/0x20
  [   77.367295]  bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.367821]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.368278]  ? mark_lock+0xa3/0x9b0
  [   77.368641]  ? kvm_sched_clock_read+0x14/0x30
  [   77.369096]  ? sched_clock+0x5/0x10
  [   77.369460]  ? sched_clock_cpu+0x18/0x110
  [   77.369876]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.370330]  ___bpf_prog_run+0x16c0/0x28f0
  [   77.370755]  __bpf_prog_run32+0x83/0xc0
  [   77.371153]  ? __bpf_prog_run64+0xc0/0xc0
  [   77.371568]  ? match_held_lock+0x1b/0x230
  [   77.371984]  ? rcu_read_lock_held+0xa1/0xb0
  [   77.372416]  ? rcu_is_watching+0x34/0x50
  [   77.372826]  sk_filter_trim_cap+0x17c/0x4d0
  [   77.373259]  ? sock_kzfree_s+0x40/0x40
  [   77.373648]  ? __get_filter+0x150/0x150
  [   77.374059]  ? skb_copy_datagram_from_iter+0x80/0x280
  [   77.374581]  ? do_raw_spin_unlock+0xa5/0x140
  [   77.375025]  unix_dgram_sendmsg+0x33a/0xa70
  [   77.375459]  ? do_raw_spin_lock+0x1d0/0x1d0
  [   77.375893]  ? unix_peer_get+0xa0/0xa0
  [   77.376287]  ? __fget_light+0xa4/0xf0
  [   77.376670]  __sys_sendto+0x265/0x280
  [   77.377056]  ? __ia32_sys_getpeername+0x50/0x50
  [   77.377523]  ? lock_downgrade+0x350/0x350
  [   77.377940]  ? __sys_setsockopt+0x2a6/0x2c0
  [   77.378374]  ? sock_read_iter+0x240/0x240
  [   77.378789]  ? __sys_socketpair+0x22a/0x300
  [   77.379221]  ? __ia32_sys_socket+0x50/0x50
  [   77.379649]  ? mark_held_locks+0x1d/0x90
  [   77.380059]  ? trace_hardirqs_on_thunk+0x1a/0x1c
  [   77.380536]  __x64_sys_sendto+0x74/0x90
  [   77.380938]  do_syscall_64+0x68/0x2a0
  [   77.381324]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [   77.381878] RIP: 0033:0x44c070
  [...]

After further debugging, turns out while in case of other helper functions
we disallow passing modified ctx, the special case of ld/abs/ind instruction
which has similar semantics (except r6 being the ctx argument) is missing
such check. Modified ctx is impossible here as bpf_skb_load_helper_8_no_cache()
and others are expecting skb fields in original position, hence, add
check_ctx_reg() to reject any modified ctx. Issue was first introduced back
in f1174f77b50c ("bpf/verifier: rework value tracking").

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Reported-by: Anatoly Trosinenko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge tag 'linux-watchdog-5.5-fixes' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog fixes from Wim Van Sebroeck:
- fix module aliases
- fix potential build errors
- fix missing conversion of imx7ulp_wdt_enable()
- fix platform_get_irq() complaints
- fix NCT6116D support

* tag 'linux-watchdog-5.5-fixes' of git://www.linux-watchdog.org/linux-watchdog:
  watchdog: orion: fix platform_get_irq() complaints
  watchdog: rn5t618_wdt: fix module aliases
  watchdog: tqmx86_wdt: Fix build error
  watchdog: max77620_wdt: fix potential build errors
  watchdog: imx7ulp: Fix missing conversion of imx7ulp_wdt_enable()
  watchdog: w83627hf_wdt: Fix support NCT6116D

Merge branch 'atlantic-bugfixes'

Igor Russkikh says:

====================
Aquantia/Marvell atlantic bugfixes 2020/01

Here is a set of recently discovered bugfixes,
====================

Signed-off-by: David S. Miller <[email protected]>

net: atlantic: remove duplicate entries

Function entries were duplicated accidentally, removing the dups.

Fixes: ea4b4d7fc106 ("net: atlantic: loopback tests via private flags")
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: atlantic: loopback configuration in improper place

Initial loopback configuration should be called earlier, before
starting traffic on HW blocks. Otherwise depending on race conditions
it could be kept disabled.

Fixes: ea4b4d7fc106 ("net: atlantic: loopback tests via private flags")
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: atlantic: broken link status on old fw

Last code/checkpatch cleanup did a copy paste error where code from
firmware 3 API logic was moved to firmware 1 logic.

This resulted in FW1.x users would never see the link state as active.

Fixes: 7b0c342f1f67 ("net: atlantic: code style cleanup")
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bpf: cgroup: prevent out-of-order release of cgroup bpf

Before commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
cgroup bpf structures were released with
corresponding cgroup structures. It guaranteed the hierarchical order
of destruction: children were always first. It preserved attached
programs from being released before their propagated copies.

But with cgroup auto-detachment there are no such guarantees anymore:
cgroup bpf is released as soon as the cgroup is offline and there are
no live associated sockets. It means that an attached program can be
detached and released, while its propagated copy is still living
in the cgroup subtree. This will obviously lead to an use-after-free
bug.

To reproduce the issue the following script can be used:

  #!/bin/bash

  CGROOT=/sys/fs/cgroup

  mkdir -p ${CGROOT}/A ${CGROOT}/B ${CGROOT}/A/C
  sleep 1

  ./test_cgrp2_attach ${CGROOT}/A egress &
  A_PID=$!
  ./test_cgrp2_attach ${CGROOT}/B egress &
  B_PID=$!

  echo $$ > ${CGROOT}/A/C/cgroup.procs
  iperf -s &
  S_PID=$!
  iperf -c localhost -t 100 &
  C_PID=$!

  sleep 1

  echo $$ > ${CGROOT}/B/cgroup.procs
  echo ${S_PID} > ${CGROOT}/B/cgroup.procs
  echo ${C_PID} > ${CGROOT}/B/cgroup.procs

  sleep 1

  rmdir ${CGROOT}/A/C
  rmdir ${CGROOT}/A

  sleep 1

  kill -9 ${S_PID} ${C_PID} ${A_PID} ${B_PID}

On the unpatched kernel the following stacktrace can be obtained:

[   33.619799] BUG: unable to handle page fault for address: ffffbdb4801ab002
[   33.620677] #PF: supervisor read access in kernel mode
[   33.621293] #PF: error_code(0x0000) - not-present page
[   33.622754] Oops: 0000 [#1] SMP NOPTI
[   33.623202] CPU: 0 PID: 601 Comm: iperf Not tainted 5.5.0-rc2+ #23
[   33.625545] RIP: 0010:__cgroup_bpf_run_filter_skb+0x29f/0x3d0
[   33.635809] Call Trace:
[   33.636118]  ? __cgroup_bpf_run_filter_skb+0x2bf/0x3d0
[   33.636728]  ? __switch_to_asm+0x40/0x70
[   33.637196]  ip_finish_output+0x68/0xa0
[   33.637654]  ip_output+0x76/0xf0
[   33.638046]  ? __ip_finish_output+0x1c0/0x1c0
[   33.638576]  __ip_queue_xmit+0x157/0x410
[   33.639049]  __tcp_transmit_skb+0x535/0xaf0
[   33.639557]  tcp_write_xmit+0x378/0x1190
[   33.640049]  ? _copy_from_iter_full+0x8d/0x260
[   33.640592]  tcp_sendmsg_locked+0x2a2/0xdc0
[   33.641098]  ? sock_has_perm+0x10/0xa0
[   33.641574]  tcp_sendmsg+0x28/0x40
[   33.641985]  sock_sendmsg+0x57/0x60
[   33.642411]  sock_write_iter+0x97/0x100
[   33.642876]  new_sync_write+0x1b6/0x1d0
[   33.643339]  vfs_write+0xb6/0x1a0
[   33.643752]  ksys_write+0xa7/0xe0
[   33.644156]  do_syscall_64+0x5b/0x1b0
[   33.644605]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by grabbing a reference to the bpf structure of each ancestor
on the initialization of the cgroup bpf structure, and dropping the
reference at the end of releasing the cgroup bpf structure.

This will restore the hierarchical order of cgroup bpf releasing,
without adding any operations on hot paths.

Thanks to Josef Bacik for the debugging and the initial analysis of
the problem.

Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Reported-by: Josef Bacik <[email protected]>
Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Song Liu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

firmware: tee_bnxt: Fix multiple call to tee_client_close_context

Fix calling multiple tee_client_close_context in case of shm allocation
fails.

Fixes: 246880958ac9 (“firmware: broadcom: add OP-TEE based BNXT f/w manager”)
Signed-off-by: Vikas Gupta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Preserve priority when setting CPU port.

The 6390 family uses an extended register to set the port connected to
the CPU. The lower 5 bits indicate the port, the upper three bits are
the priority of the frames as they pass through the switch, what
egress queue they should use, etc. Since frames being set to the CPU
are typically management frames, BPDU, IGMP, ARP, etc set the priority
to 7, the reset default, and the highest.

Fixes: 33641994a676 ("net: dsa: mv88e6xxx: Monitor and Management tables")
Signed-off-by: Andrew Lunn <[email protected]>
Tested-by: Chris Healy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: sxgbe: Rename Samsung to lowercase

Fix up inconsistent usage of upper and lowercase letters in "Samsung"
name.

"SAMSUNG" is not an abbreviation but a regular trademarked name.
Therefore it should be written with lowercase letters starting with
capital letter.

Although advertisement materials usually use uppercase "SAMSUNG", the
lowercase version is used in all legal aspects (e.g. on Wikipedia and in
privacy/legal statements on
https://www.samsung.com/semiconductor/privacy-global/).

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: wan: sdla: Fix cast from pointer to integer of different size

Since net_device.mem_start is unsigned long, it should not be cast to
int right before casting to pointer.  This fixes warning (compile
testing on alpha architecture):

    drivers/net/wan/sdla.c: In function ‘sdla_transmit’:
    drivers/net/wan/sdla.c:711:13: warning:
        cast to pointer from integer of different size [-Wint-to-pointer-cast]

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY

This patch is to fix a memleak caused by no place to free cmd->obj.chunk
for the unprocessed SCTP_CMD_REPLY. This issue occurs when failing to
process a cmd while there're still SCTP_CMD_REPLY cmds on the cmd seq
with an allocated chunk in cmd->obj.chunk.

So fix it by freeing cmd->obj.chunk for each SCTP_CMD_REPLY cmd left on
the cmd seq when any cmd returns error. While at it, also remove 'nomem'
label.

Reported-by: [email protected]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: eliminate KMSAN: uninit-value in __tipc_nl_compat_dumpit error

syzbot found the following crash on:
=====================================================
BUG: KMSAN: uninit-value in __nlmsg_parse include/net/netlink.h:661 [inline]
BUG: KMSAN: uninit-value in nlmsg_parse_deprecated
include/net/netlink.h:706 [inline]
BUG: KMSAN: uninit-value in __tipc_nl_compat_dumpit+0x553/0x11e0
net/tipc/netlink_compat.c:215
CPU: 0 PID: 12425 Comm: syz-executor062 Not tainted 5.5.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x220 lib/dump_stack.c:118
  kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
  __msan_warning+0x57/0xa0 mm/kmsan/kmsan_instr.c:245
  __nlmsg_parse include/net/netlink.h:661 [inline]
  nlmsg_parse_deprecated include/net/netlink.h:706 [inline]
  __tipc_nl_compat_dumpit+0x553/0x11e0 net/tipc/netlink_compat.c:215
  tipc_nl_compat_dumpit+0x761/0x910 net/tipc/netlink_compat.c:308
  tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
  tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
  genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
  genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
  genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
  netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
  genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
  netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
  netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
  netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
  sock_sendmsg_nosec net/socket.c:639 [inline]
  sock_sendmsg net/socket.c:659 [inline]
  ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
  ___sys_sendmsg net/socket.c:2384 [inline]
  __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
  __do_sys_sendmsg net/socket.c:2426 [inline]
  __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
  __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
  do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x444179
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 1b d8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffd2d6409c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000444179
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00000000006ce018 R08: 0000000000000000 R09: 00000000004002e0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401e20
R13: 0000000000401eb0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
  kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
  kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:86
  slab_alloc_node mm/slub.c:2774 [inline]
  __kmalloc_node_track_caller+0xe47/0x11f0 mm/slub.c:4382
  __kmalloc_reserve net/core/skbuff.c:141 [inline]
  __alloc_skb+0x309/0xa50 net/core/skbuff.c:209
  alloc_skb include/linux/skbuff.h:1049 [inline]
  nlmsg_new include/net/netlink.h:888 [inline]
  tipc_nl_compat_dumpit+0x6e4/0x910 net/tipc/netlink_compat.c:301
  tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
  tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
  genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
  genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
  genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
  netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
  genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
  netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
  netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
  netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
  sock_sendmsg_nosec net/socket.c:639 [inline]
  sock_sendmsg net/socket.c:659 [inline]
  ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
  ___sys_sendmsg net/socket.c:2384 [inline]
  __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
  __do_sys_sendmsg net/socket.c:2426 [inline]
  __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
  __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
  do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
=====================================================

The complaint above occurred because the memory region pointed by attrbuf
variable was not initialized. To eliminate this warning, we use kcalloc()
rather than kmalloc_array() to allocate memory for attrbuf.

Reported-by: [email protected]
Signed-off-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'spi-fix-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A small collection of fixes here, one to make the newly added PTP
  timestamping code more accurate, a few driver fixes and a fix for the
  core DT binding to document the fact that we support eight wire buses"

* tag 'spi-fix-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: Document Octal mode as valid SPI bus width
  spi: spi-dw: Add lock protect dw_spi rx/tx to prevent concurrent calls
  spi: spi-fsl-dspi: Fix 16-bit word order in 32-bit XSPI mode
  spi: Don't look at TX buffer for PTP system timestamping
  spi: uniphier: Fix FIFO threshold

Merge tag 'regulator-fix-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"Three small fixes here, two the result of Axel Lin's amazing work
  tracking down inconsistencies in drivers"

* tag 'regulator-fix-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: bd70528: Remove .set_ramp_delay for bd70528_ldo_ops
  regulator: axp20x: Fix axp20x_set_ramp_delay
  regulator: axp20x: Fix AXP22x ELDO2 regulator enable bitmask

chardev: Avoid potential use-after-free in 'chrdev_open()'

'chrdev_open()' calls 'cdev_get()' to obtain a reference to the
'struct cdev *' stashed in the 'i_cdev' field of the target inode
structure. If the pointer is NULL, then it is initialised lazily by
looking up the kobject in the 'cdev_map' and so the whole procedure is
protected by the 'cdev_lock' spinlock to serialise initialisation of
the shared pointer.

Unfortunately, it is possible for the initialising thread to fail *after*
installing the new pointer, for example if the subsequent '->open()' call
on the file fails. In this case, 'cdev_put()' is called, the reference
count on the kobject is dropped and, if nobody else has taken a reference,
the release function is called which finally clears 'inode->i_cdev' from
'cdev_purge()' before potentially freeing the object. The problem here
is that a racing thread can happily take the 'cdev_lock' and see the
non-NULL pointer in the inode, which can result in a refcount increment
from zero and a warning:

  |  ------------[ cut here ]------------
  |  refcount_t: addition on 0; use-after-free.
  |  WARNING: CPU: 2 PID: 6385 at lib/refcount.c:25 refcount_warn_saturate+0x6d/0xf0
  |  Modules linked in:
  |  CPU: 2 PID: 6385 Comm: repro Not tainted 5.5.0-rc2+ #22
  |  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  |  RIP: 0010:refcount_warn_saturate+0x6d/0xf0
  |  Code: 05 55 9a 15 01 01 e8 9d aa c8 ff 0f 0b c3 80 3d 45 9a 15 01 00 75 ce 48 c7 c7 00 9c 62 b3 c6 08
  |  RSP: 0018:ffffb524c1b9bc70 EFLAGS: 00010282
  |  RAX: 0000000000000000 RBX: ffff9e9da1f71390 RCX: 0000000000000000
  |  RDX: ffff9e9dbbd27618 RSI: ffff9e9dbbd18798 RDI: ffff9e9dbbd18798
  |  RBP: 0000000000000000 R08: 000000000000095f R09: 0000000000000039
  |  R10: 0000000000000000 R11: ffffb524c1b9bb20 R12: ffff9e9da1e8c700
  |  R13: ffffffffb25ee8b0 R14: 0000000000000000 R15: ffff9e9da1e8c700
  |  FS:  00007f3b87d26700(0000) GS:ffff9e9dbbd00000(0000) knlGS:0000000000000000
  |  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  |  CR2: 00007fc16909c000 CR3: 000000012df9c000 CR4: 00000000000006e0
  |  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  |  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  |  Call Trace:
  |   kobject_get+0x5c/0x60
  |   cdev_get+0x2b/0x60
  |   chrdev_open+0x55/0x220
  |   ? cdev_put.part.3+0x20/0x20
  |   do_dentry_open+0x13a/0x390
  |   path_openat+0x2c8/0x1470
  |   do_filp_open+0x93/0x100
  |   ? selinux_file_ioctl+0x17f/0x220
  |   do_sys_open+0x186/0x220
  |   do_syscall_64+0x48/0x150
  |   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  |  RIP: 0033:0x7f3b87efcd0e
  |  Code: 89 54 24 08 e8 a3 f4 ff ff 8b 74 24 0c 48 8b 3c 24 41 89 c0 44 8b 54 24 08 b8 01 01 00 00 89 f4
  |  RSP: 002b:00007f3b87d259f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
  |  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b87efcd0e
  |  RDX: 0000000000000000 RSI: 00007f3b87d25a80 RDI: 00000000ffffff9c
  |  RBP: 00007f3b87d25e90 R08: 0000000000000000 R09: 0000000000000000
  |  R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe188f504e
  |  R13: 00007ffe188f504f R14: 00007f3b87d26700 R15: 0000000000000000
  |  ---[ end trace 24f53ca58db8180a ]---

Since 'cdev_get()' can already fail to obtain a reference, simply move
it over to use 'kobject_get_unless_zero()' instead of 'kobject_get()',
which will cause the racing thread to return -ENXIO if the initialising
thread fails unexpectedly.

Cc: Hillf Danton <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Al Viro <[email protected]>
Reported-by: [email protected]
Signed-off-by: Will Deacon <[email protected]>
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serdev: Don't claim unsupported ACPI serial devices

Serdev sub-system claims all ACPI serial devices that are not already
initialised. As a result, no device node is created for serial ports
on certain boards such as the Apollo Lake based UP2. This has the
unintended consequence of not being able to raise the login prompt via
serial connection.

Introduce a blacklist to reject ACPI serial devices that should not be
claimed by serdev sub-system. Add the peripheral ids for Intel HS UART
to the blacklist to bring back serial port on SoCs carrying them.

Cc: [email protected]
Signed-off-by: Punit Agrawal <[email protected]>
Acked-by: Hans de Goede <[email protected]>
Acked-by: Johan Hovold <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge tag 'rtc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC fixes from Alexandre Belloni:
"A few fixes for this cycle. The CMOS AltCentury support broke a few
  platforms with a recent BIOS so I reverted it. The mt6397 fix is not
  that critical but good to have. And finally, the sun6i fix repairs
  WiFi and BT on a few platforms.

  Summary:

   - cmos: revert AltCentury support on AMD/Hygon

   - mt6397: fix alarm register overwrite

   - sun6i: ensure clock is working on R40"

* tag 'rtc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  rtc: cmos: Revert "rtc: Fix the AltCentury value on AMD/Hygon platform"
  rtc: mt6397: fix alarm register overwrite
  rtc: sun6i: Add support for RTC clocks on R40

Merge tag 'arc-5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
"Kconfig warning, stale define, duplicate asm-offset entry ..."

* tag 'arc-5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: pt_regs: remove hardcoded registers offset
  ARC: asm-offsets: remove duplicate entry
  ARC: mm: drop stale define of __ARCH_USE_5LEVEL_HACK
  arc: eznps: fix allmodconfig kconfig warning

arm64: Revert support for execute-only user mappings

The ARMv8 64-bit architecture supports execute-only user permissions by
clearing the PTE_USER and PTE_UXN bits, practically making it a mostly
privileged mapping but from which user running at EL0 can still execute.

The downside, however, is that the kernel at EL1 inadvertently reading
such mapping would not trip over the PAN (privileged access never)
protection.

Revert the relevant bits from commit cab15ce604e5 ("arm64: Introduce
execute-only page access permissions") so that PROT_EXEC implies
PROT_READ (and therefore PTE_USER) until the architecture gains proper
support for execute-only user mappings.

Fixes: cab15ce604e5 ("arm64: Introduce execute-only page access permissions")
Cc: <[email protected]> # 4.9.x-
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tpm: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's"

There has been a bunch of reports (one from kernel bugzilla linked)
reporting that when this commit is applied it causes on some machines
boot freezes.

Unfortunately hardware where this commit causes a failure is not widely
available (only one I'm aware is Lenovo T490), which means we cannot
predict yet how long it will take to properly fix tpm_tis interrupt
probing.

Thus, the least worst short term action is to revert the code to the
state before this commit. In long term we need fix the tpm_tis probing
code to work on machines that Stefan's fix was supposed to fix.

Fixes: 21df4a8b6018 ("tpm_tis: reserve chip for duration of tpm_tis_core_init")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205935
Cc: [email protected]
Cc: Jerry Snitselaar <[email protected]>
Cc: Dan Williams <[email protected]>
Tested-by: Dan Williams <[email protected]>
Tested-by: Xiaoping Zhou <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>
Reported-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"

There has been a bunch of reports (one from kernel bugzilla linked)
reporting that when this commit is applied it causes on some machines
boot freezes.

Unfortunately hardware where this commit causes a failure is not widely
available (only one I'm aware is Lenovo T490), which means we cannot
predict yet how long it will take to properly fix tpm_tis interrupt
probing.

Thus, the least worst short term action is to revert the code to the
state before this commit. In long term we need fix the tpm_tis probing
code to work on machines that Stefan's fix was supposed to fix.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=205935
Fixes: 1ea32c83c699 ("tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts")
Cc: [email protected]
Cc: Jerry Snitselaar <[email protected]>
Cc: Dan Williams <[email protected]>
Tested-by: Dan Williams <[email protected]>
Tested-by: Xiaoping Zhou <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>
Reported-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

tpm: Revert "tpm_tis: reserve chip for duration of tpm_tis_core_init"

Revert a commit, which was included in Linux v5.5-rc3 because it did not
properly fix the issues it was supposed to fix.

Fixes: 21df4a8b6018 ("tpm_tis: reserve chip for duration of tpm_tis_core_init")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205935
Cc: [email protected]
Cc: Jerry Snitselaar <[email protected]>
Cc: Dan Williams <[email protected]>
Tested-by: Dan Williams <[email protected]>
Tested-by: Xiaoping Zhou <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

Merge tag 'asoc-fix-v5.5-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.5

More fixes that have been collected, nothing super remarkable here - the
few core fixes are mainly error handling related as are many of the
driver fixes.

USB: Fix: Don't skip endpoint descriptors with maxpacket=0

It turns out that even though endpoints with a maxpacket length of 0
aren't useful for data transfer, the descriptors do serve other
purposes.  In particular, skipping them will also skip over other
class-specific descriptors for classes such as UVC.  This unexpected
side effect has caused some UVC cameras to stop working.

In addition, the USB spec requires that when isochronous endpoint
descriptors are present in an interface's altsetting 0 (which is true
on some devices), the maxpacket size _must_ be set to 0.  Warning
about such things seems like a bad idea.

This patch updates an earlier commit which would log a warning and
skip these endpoint descriptors.  Now we only log a warning, and we
don't even do that for isochronous endpoints in altsetting 0.

We don't need to worry about preventing endpoints with maxpacket = 0
from ever being used for data transfers; usb_submit_urb() already
checks for this.

Reported-and-tested-by: Roger Whittaker <[email protected]>
Fixes: d482c7bb0541 ("USB: Skip endpoints with 0 maxpacket length")
Signed-off-by: Alan Stern <[email protected]>
CC: Laurent Pinchart <[email protected]>
Link: https://marc.info/?l=linux-usb&m=157790377329882&w=2
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

i2c: bcm2835: Store pointer to bus clock

The commit bebff81fb8b9 ("i2c: bcm2835: Model Divider in CCF") introduced
a NULL pointer dereference on driver unload. It seems that we can't fetch
the bus clock via devm_clk_get in bcm2835_i2c_remove. As an alternative
approach store a pointer to the bus clock in the private driver structure.

Fixes: bebff81fb8b9 ("i2c: bcm2835: Model Divider in CCF")
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>

dt-bindings: i2c: at91: fix i2c-sda-hold-time-ns documentation for sam9x60

SAM9X60 also supports i2c-sda-hold-time-ns. Fix the documentation accordingly.

Fixes: 2034e3f4c9a5 ("dt-bindings: i2c: at91: add new compatible")
Signed-off-by: Eugen Hristev <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>

i2c: at91: fix clk_offset for sam9x60

In SAM9X60 datasheet, FLEX_TWI_CWGR register description mentions clock
offset of 3 cycles (compared to 4 in eg. SAMA5D3).
This is the same offset as in SAMA5D2.

Fixes: b00277923743 ("i2c: at91: add new platform support for sam9x60")
Suggested-by: Codrin Ciubotariu <[email protected]>
Signed-off-by: Eugen Hristev <[email protected]>
Acked-by: Ludovic Desroches <[email protected]>
Reviewed-by: Codrin Ciubotariu <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>

netfilter: flowtable: add nf_flowtable_time_stamp

This patch adds nf_flowtable_time_stamp and updates the existing code to
use it.

This patch is also implicitly fixing up hardware statistic fetching via
nf_flow_offload_stats() where casting to u32 is missing. Use
nf_flow_timeout_delta() to fix this.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Acked-by: wenxu <[email protected]>

macb: Don't unregister clks unconditionally

The only clk init function in this driver that register a clk is
fu540_c000_clk_init(), and thus we need to unregister the clk when this
driver is removed on that platform. Other init functions, for example
macb_clk_init(), don't register clks and therefore we shouldn't
unregister the clks when this driver is removed. Convert this
registration path to devm so it gets auto-unregistered when this driver
is removed and drop the clk_unregister() calls in driver remove (and
error paths) so that we don't erroneously remove a clk from the system
that isn't registered by this driver.

Otherwise we get strange crashes with a use-after-free when the
devm_clk_get() call in macb_clk_init() calls clk_put() on a clk pointer
that has become invalid because it is freed in clk_unregister().

Cc: Nicolas Ferre <[email protected]>
Cc: Yash Shah <[email protected]>
Reported-by: Guenter Roeck <[email protected]>
Fixes: c218ad559020 ("macb: Add support for SiFive FU540-C000")
Signed-off-by: Stephen Boyd <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

MAINTAINERS: Drop obsolete entries from Samsung sxgbe ethernet driver

The emails to [email protected] and [email protected] bounce
with 550 error code:

    host mailin.samsung.com[203.254.224.12] said: 550
    5.1.1 Recipient address rejected: User unknown (in reply to RCPT TO
    command)"

Drop Girish K S and Vipul Pandya from sxgbe maintainers entry.

Cc: Byungho An <[email protected]>
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue

The len used for skb_put_padto is wrong, it need to add len of hdr.

In qrtr_node_enqueue, local variable size_t len is assign with
skb->len, then skb_push(skb, sizeof(*hdr)) will add skb->len with
sizeof(*hdr), so local variable size_t len is not same with skb->len
after skb_push(skb, sizeof(*hdr)).

Then the purpose of skb_put_padto(skb, ALIGN(len, 4)) is to add add
pad to the end of the skb's data if skb->len is not aligned to 4, but
unfortunately it use len instead of skb->len, at this line, skb->len
is 32 bytes(sizeof(*hdr)) more than len, for example, len is 3 bytes,
then skb->len is 35 bytes(3 + 32), and ALIGN(len, 4) is 4 bytes, so
__skb_put_padto will do nothing after check size(35) < len(4), the
correct value should be 36(sizeof(*hdr) + ALIGN(len, 4) = 32 + 4),
then __skb_put_padto will pass check size(35) < len(36) and add 1 byte
to the end of skb's data, then logic is correct.

function of skb_push:
void *skb_push(struct sk_buff *skb, unsigned int len)
{
skb->data -= len;
skb->len += len;
if (unlikely(skb->data < skb->head))
skb_under_panic(skb, len, __builtin_return_address(0));
return skb->data;
}

function of skb_put_padto
static inline int skb_put_padto(struct sk_buff *skb, unsigned int len)
{
return __skb_put_padto(skb, len, true);
}

function of __skb_put_padto
static inline int __skb_put_padto(struct sk_buff *skb, unsigned int len,
bool free_on_error)
{
unsigned int size = skb->len;

if (unlikely(size < len)) {
len -= size;
if (__skb_pad(skb, len, free_on_error))
return -ENOMEM;
__skb_put(skb, len);
}
return 0;
}

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Wen Gong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Linux 5.5-rc5

drivers/net/b44: Change to non-atomic bit operations on pwol_mask

Atomic operations that span cache lines are super-expensive on x86
(not just to the current processor, but also to other processes as all
memory operations are blocked until the operation completes). Upcoming
x86 processors have a switch to cause such operations to generate a #AC
trap. It is expected that some real time systems will enable this mode
in BIOS.

In preparation for this, it is necessary to fix code that may execute
atomic instructions with operands that cross cachelines because the #AC
trap will crash the kernel.

Since "pwol_mask" is local and never exposed to concurrency, there is
no need to set bits in pwol_mask using atomic operations.

Directly operate on the byte which contains the bit instead of using
__set_bit() to avoid any big endian concern due to type cast to
unsigned long in __set_bit().

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'riscv/for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:
"Several fixes for RISC-V:

   - Fix function graph trace support

   - Prefix the CSR IRQ_* macro names with "RV_", to avoid collisions
     with macros elsewhere in the Linux kernel tree named "IRQ_TIMER"

   - Use __pa_symbol() when computing the physical address of a kernel
     symbol, rather than __pa()

   - Mark the RISC-V port as supporting GCOV

  One DT addition:

   - Describe the L2 cache controller in the FU540 DT file

  One documentation update:

   - Add patch acceptance guideline documentation"

* tag 'riscv/for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  Documentation: riscv: add patch acceptance guidelines
  riscv: prefix IRQ_ macro names with an RV_ namespace
  clocksource: riscv: add notrace to riscv_sched_clock
  riscv: ftrace: correct the condition logic in function graph tracer
  riscv: dts: Add DT support for SiFive L2 cache controller
  riscv: gcov: enable gcov for RISC-V
  riscv: mm: use __pa_symbol for kernel symbols

netfilter: nf_tables: unbind callbacks from flowtable destroy path

Callback unbinding needs to be done after nf_flow_table_free(),
otherwise entries are not removed from the hardware.

Update nft_unregister_flowtable_net_hooks() to call
nf_unregister_net_hook() instead since the commit/abort paths do not
deal with the callback unbinding anymore.

Add a comment to nft_flowtable_event() to clarify that
flow_offload_netdev_event() already removes the entries before the
callback unbinding.

Fixes: 8bb69f3b2918 ("netfilter: nf_tables: add flowtable offload control plane")
Fixes ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Acked-by: wenxu <[email protected]>

netfilter: nf_flow_table_offload: fix the nat port mangle.

Shift on 32-bit word to define the port number depends on the flow
direction.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Fixes: 7acd9378dc652 ("netfilter: nf_flow_table_offload: Correct memcpy size for flow_overload_mangle()")
Signed-off-by: wenxu <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_flow_table_offload: check the status of dst_neigh

It is better to get the dst_neigh with neigh->lock and check the
nud_state is VALID. If there is not neigh previous, the lookup will
Create a non NUD_VALID with 00:00:00:00:00:00 mac.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: wenxu <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_flow_table_offload: fix incorrect ethernet dst address

Ethernet destination for original traffic takes the source ethernet address
in the reply direction. For reply traffic, this takes the source
ethernet address of the original direction.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: wenxu <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nft_flow_offload: fix underflow in flowtable reference counter

The .deactivate and .activate interfaces already deal with the reference
counter. Otherwise, this results in spurious "Device is busy" errors.

Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: wenxu <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Documentation: riscv: add patch acceptance guidelines

Formalize, in kernel documentation, the patch acceptance policy for
arch/riscv.  In summary, it states that as maintainers, we plan to
only accept patches for new modules or extensions that have been
frozen or ratified by the RISC-V Foundation.

We've been following these guidelines for the past few months.  In the
meantime, we've received quite a bit of feedback that it would be
helpful to have these guidelines formally documented.

Based on a suggestion from Matthew Wilcox, we also add a link to this
file to Documentation/process/index.rst, to make this document easier
to find.  The format of this document has also been changed to align
to the format outlined in the maintainer entry profiles, in accordance
with comments from Jon Corbet and Dan Williams.

Signed-off-by: Paul Walmsley <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Krste Asanovic <[email protected]>
Cc: Andrew Waterman <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jonathan Corbet <[email protected]>

riscv: prefix IRQ_ macro names with an RV_ namespace

"IRQ_TIMER", used in the arch/riscv CSR header file, is a sufficiently
generic macro name that it's used by several source files across the
Linux code base. Some of these other files ultimately include the
arch/riscv CSR include file, causing collisions. Fix by prefixing the
RISC-V csr.h IRQ_ macro names with an RV_ prefix.

Fixes: a4c3733d32a72 ("riscv: abstract out CSR names for supervisor vs machine mode")
Reported-by: Olof Johansson <[email protected]>
Acked-by: Olof Johansson <[email protected]>
Signed-off-by: Paul Walmsley <[email protected]>

clocksource: riscv: add notrace to riscv_sched_clock

When enabling ftrace graph tracer, it gets the tracing clock in
ftrace_push_return_trace().  Eventually, it invokes riscv_sched_clock()
to get the clock value.  If riscv_sched_clock() isn't marked with
'notrace', it will call ftrace_push_return_trace() and cause infinite
loop.

The result of failure as follow:

command: echo function_graph >current_tracer
[   46.176787] Unable to handle kernel paging request at virtual address ffffffe04fb38c48
[   46.177309] Oops [#1]
[   46.177478] Modules linked in:
[   46.177770] CPU: 0 PID: 256 Comm: $d Not tainted 5.5.0-rc1 #47
[   46.177981] epc: ffffffe00035e59a ra : ffffffe00035e57e sp : ffffffe03a7569b0
[   46.178216]  gp : ffffffe000d29b90 tp : ffffffe03a756180 t0 : ffffffe03a756968
[   46.178430]  t1 : ffffffe00087f408 t2 : ffffffe03a7569a0 s0 : ffffffe03a7569f0
[   46.178643]  s1 : ffffffe00087f408 a0 : 0000000ac054cda4 a1 : 000000000087f411
[   46.178856]  a2 : 0000000ac054cda4 a3 : 0000000000373ca0 a4 : ffffffe04fb38c48
[   46.179099]  a5 : 00000000153e22a8 a6 : 00000000005522ff a7 : 0000000000000005
[   46.179338]  s2 : ffffffe03a756a90 s3 : ffffffe00032811c s4 : ffffffe03a756a58
[   46.179570]  s5 : ffffffe000d29fe0 s6 : 0000000000000001 s7 : 0000000000000003
[   46.179809]  s8 : 0000000000000003 s9 : 0000000000000002 s10: 0000000000000004
[   46.180053]  s11: 0000000000000000 t3 : 0000003fc815749c t4 : 00000000000efc90
[   46.180293]  t5 : ffffffe000d29658 t6 : 0000000000040000
[   46.180482] status: 0000000000000100 badaddr: ffffffe04fb38c48 cause: 000000000000000f

Signed-off-by: Zong Li <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
[[email protected]: cleaned up patch description]
Fixes: 92e0d143fdef ("clocksource/drivers/riscv_timer: Provide the sched_clock")
Cc: [email protected]
Signed-off-by: Paul Walmsley <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"17 fixes"

* emailed patches from Andrew Morton <[email protected]>:
  hexagon: define ioremap_uc
  ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
  ocfs2: call journal flush to mark journal as empty after journal recovery when mount
  mm/hugetlb: defer freeing of huge pages if in non-task context
  mm/gup: fix memory leak in __gup_benchmark_ioctl
  mm/oom: fix pgtables units mismatch in Killed process message
  fs/posix_acl.c: fix kernel-doc warnings
  hexagon: work around compiler crash
  hexagon: parenthesize registers in asm predicates
  fs/namespace.c: make to_mnt_ns() static
  fs/nsfs.c: include headers for missing declarations
  fs/direct-io.c: include fs/internal.h for missing prototype
  mm: move_pages: return valid node id in status if the page is already on the target node
  memcg: account security cred as well to kmemcg
  kcov: fix struct layout for kcov_remote_arg
  mm/zsmalloc.c: fix the migrated zspage statistics.
  mm/memory_hotplug: shrink zones when offlining memory

Merge tag 'apparmor-pr-2020-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor fixes from John Johansen:

- performance regression: only get a label reference if the fast path
   check fails

- fix aa_xattrs_match() may sleep while holding a RCU lock

- fix bind mounts aborting with -ENOMEM

* tag 'apparmor-pr-2020-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
  apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
  apparmor: only get a label reference if the fast path check fails
  apparmor: fix bind mounts aborting with -ENOMEM

block: remove unused mp_bvec_last_segment

After commit 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
this function is unused, remove it.

Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock

aa_xattrs_match() is unfortunately calling vfs_getxattr_alloc() from a
context protected by an rcu_read_lock. This can not be done as
vfs_getxattr_alloc() may sleep regardles of the gfp_t value being
passed to it.

Fix this by breaking the rcu_read_lock on the policy search when the
xattr match feature is requested and restarting the search if a policy
changes occur.

Fixes: 8e51f9087f40 ("apparmor: Add support for attaching profiles via xattr, presence and value")
Reported-by: Jia-Ju Bai <[email protected]>
Reported-by: Al Viro <[email protected]>
Signed-off-by: John Johansen <[email protected]>

Merge tag 'mips_fixes_5.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:
"A collection of MIPS fixes:

   - Fill the struct cacheinfo shared_cpu_map field with sensible
     values, notably avoiding issues with perf which was unhappy in the
     absence of these values.

   - A boot fix for Loongson 2E & 2F machines which was fallout from
     some refactoring performed this cycle.

   - A Kconfig dependency fix for the Loongson CPU HWMon driver.

   - A couple of VDSO fixes, ensuring gettimeofday() behaves
     appropriately for kernel configurations that don't include support
     for a clocksource the VDSO can use & fixing the calling convention
     for the n32 & n64 VDSOs which would previously clobber the $gp/$28
     register.

   - A build fix for vmlinuz compressed images which were
     inappropriately building with -fsanitize-coverage despite not being
     part of the kernel proper, then failing to link due to the missing
     __sanitizer_cov_trace_pc() function.

   - A couple of eBPF JIT fixes, including disabling it for MIPS32 due
     to a large number of issues with the code generated there &
     reflecting ISA dependencies in Kconfig to enforce that systems
     which don't support the JIT must include the interpreter"

* tag 'mips_fixes_5.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Avoid VDSO ABI breakage due to global register variable
  MIPS: BPF: eBPF JIT: check for MIPS ISA compliance in Kconfig
  MIPS: BPF: Disable MIPS32 eBPF JIT
  MIPS: Prevent link failure with kcov instrumentation
  MIPS: Kconfig: Use correct form for 'depends on'
  mips: Fix gettimeofday() in the vdso library
  MIPS: Fix boot on Fuloong2 systems
  mips: cacheinfo: report shared CPU map

hexagon: define ioremap_uc

Similar to commit 38e45d81d14e ("sparc64: implement ioremap_uc") define
ioremap_uc for hexagon to avoid errors from
-Wimplicit-function-definition.

Link: http://lkml.kernel.org/r/[email protected]
Link: https://github.com/ClangBuiltLinux/linux/issues/797
Fixes: e537654b7039 ("lib: devres: add a helper function for ioremap_uc")
Signed-off-by: Nick Desaulniers <[email protected]>
Suggested-by: Nathan Chancellor <[email protected]>
Acked-by: Brian Cain <[email protected]>
Cc: Lee Jones <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Tuowen Zhao <[email protected]>
Cc: Mika Westerberg <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Alexios Zavras <[email protected]>
Cc: Allison Randal <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Richard Fontana <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less

Because ocfs2_get_dlm_debug() function is called once less here, ocfs2
file system will trigger the system crash, usually after ocfs2 file
system is unmounted.

This system crash is caused by a generic memory corruption, these crash
backtraces are not always the same, for exapmle,

    ocfs2: Unmounting device (253,16) on (node 172167785)
    general protection fault: 0000 [#1] SMP PTI
    CPU: 3 PID: 14107 Comm: fence_legacy Kdump:
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
    RIP: 0010:__kmalloc+0xa5/0x2a0
    Code: 00 00 4d 8b 07 65 4d 8b
    RSP: 0018:ffffaa1fc094bbe8 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: d310a8800d7a3faf RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff96e68fc036c0
    RBP: d310a8800d7a3faf R08: ffff96e6ffdb10a0 R09: 00000000752e7079
    R10: 000000000001c513 R11: 0000000004091041 R12: 0000000000000dc0
    R13: 0000000000000039 R14: ffff96e68fc036c0 R15: ffff96e68fc036c0
    FS:  00007f699dfba540(0000) GS:ffff96e6ffd80000(0000) knlGS:00000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055f3a9d9b768 CR3: 000000002cd1c000 CR4: 00000000000006e0
    Call Trace:
     ext4_htree_store_dirent+0x35/0x100 [ext4]
     htree_dirblock_to_tree+0xea/0x290 [ext4]
     ext4_htree_fill_tree+0x1c1/0x2d0 [ext4]
     ext4_readdir+0x67c/0x9d0 [ext4]
     iterate_dir+0x8d/0x1a0
     __x64_sys_getdents+0xab/0x130
     do_syscall_64+0x60/0x1f0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f699d33a9fb

This regression problem was introduced by commit e581595ea29c ("ocfs: no
need to check return value of debugfs_create functions").

Link: http://lkml.kernel.org/r/[email protected]
Fixes: e581595ea29c ("ocfs: no need to check return value of debugfs_create functions")
Signed-off-by: Gang He <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]> [5.3+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ocfs2: call journal flush to mark journal as empty after journal recovery when mount

If journal is dirty when mount, it will be replayed but jbd2 sb log tail
cannot be updated to mark a new start because journal->j_flag has
already been set with JBD2_ABORT first in journal_init_common.

When a new transaction is committed, it will be recored in block 1
first(journal->j_tail is set to 1 in journal_reset).  If emergency
restart happens again before journal super block is updated
unfortunately, the new recorded trans will not be replayed in the next
mount.

The following steps describe this procedure in detail.
1. mount and touch some files
2. these transactions are committed to journal area but not checkpointed
3. emergency restart
4. mount again and its journals are replayed
5. journal super block's first s_start is 1, but its s_seq is not updated
6. touch a new file and its trans is committed but not checkpointed
7. emergency restart again
8. mount and journal is dirty, but trans committed in 6 will not be
replayed.

This exception happens easily when this lun is used by only one node.
If it is used by multi-nodes, other node will replay its journal and its
journal super block will be updated after recovery like what this patch
does.

ocfs2_recover_node->ocfs2_replay_journal.

The following jbd2 journal can be generated by touching a new file after
journal is replayed, and seq 15 is the first valid commit, but first seq
is 13 in journal super block.

logdump:
  Block 0: Journal Superblock
  Seq: 0   Type: 4 (JBD2_SUPERBLOCK_V2)
  Blocksize: 4096   Total Blocks: 32768   First Block: 1
  First Commit ID: 13   Start Log Blknum: 1
  Error: 0
  Feature Compat: 0
  Feature Incompat: 2 block64
  Feature RO compat: 0
  Journal UUID: 4ED3822C54294467A4F8E87D2BA4BC36
  FS Share Cnt: 1   Dynamic Superblk Blknum: 0
  Per Txn Block Limit    Journal: 0    Data: 0

  Block 1: Journal Commit Block
  Seq: 14   Type: 2 (JBD2_COMMIT_BLOCK)

  Block 2: Journal Descriptor
  Seq: 15   Type: 1 (JBD2_DESCRIPTOR_BLOCK)
  No. Blocknum        Flags
   0. 587             none
  UUID: 00000000000000000000000000000000
   1. 8257792         JBD2_FLAG_SAME_UUID
   2. 619             JBD2_FLAG_SAME_UUID
   3. 24772864        JBD2_FLAG_SAME_UUID
   4. 8257802         JBD2_FLAG_SAME_UUID
   5. 513             JBD2_FLAG_SAME_UUID JBD2_FLAG_LAST_TAG
  ...
  Block 7: Inode
  Inode: 8257802   Mode: 0640   Generation: 57157641 (0x3682809)
  FS Generation: 2839773110 (0xa9437fb6)
  CRC32: 00000000   ECC: 0000
  Type: Regular   Attr: 0x0   Flags: Valid
  Dynamic Features: (0x1) InlineData
  User: 0 (root)   Group: 0 (root)   Size: 7
  Links: 1   Clusters: 0
  ctime: 0x5de5d870 0x11104c61 -- Tue Dec  3 11:37:20.286280801 2019
  atime: 0x5de5d870 0x113181a1 -- Tue Dec  3 11:37:20.288457121 2019
  mtime: 0x5de5d870 0x11104c61 -- Tue Dec  3 11:37:20.286280801 2019
  dtime: 0x0 -- Thu Jan  1 08:00:00 1970
  ...
  Block 9: Journal Commit Block
  Seq: 15   Type: 2 (JBD2_COMMIT_BLOCK)

The following is journal recovery log when recovering the upper jbd2
journal when mount again.

syslog:
  ocfs2: File system on device (252,1) was not unmounted cleanly, recovering it.
  fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 0
  fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 1
  fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 2
  fs/jbd2/recovery.c:(jbd2_journal_recover, 278): JBD2: recovery, exit status 0, recovered transactions 13 to 13

Due to first commit seq 13 recorded in journal super is not consistent
with the value recorded in block 1(seq is 14), journal recovery will be
terminated before seq 15 even though it is an unbroken commit, inode
8257802 is a new file and it will be lost.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kai Li <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Reviewed-by: Changwei Ge <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/hugetlb: defer freeing of huge pages if in non-task context

The following lockdep splat was observed when a certain hugetlbfs test
was run:

  ================================
  WARNING: inconsistent lock state
  4.18.0-159.el8.x86_64+debug #1 Tainted: G        W --------- -  -
  --------------------------------
  inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
  swapper/30/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
  ffffffff9acdc038 (hugetlb_lock){+.?.}, at: free_huge_page+0x36f/0xaa0
  {SOFTIRQ-ON-W} state was registered at:
    lock_acquire+0x14f/0x3b0
    _raw_spin_lock+0x30/0x70
    __nr_hugepages_store_common+0x11b/0xb30
    hugetlb_sysctl_handler_common+0x209/0x2d0
    proc_sys_call_handler+0x37f/0x450
    vfs_write+0x157/0x460
    ksys_write+0xb8/0x170
    do_syscall_64+0xa5/0x4d0
    entry_SYSCALL_64_after_hwframe+0x6a/0xdf
  irq event stamp: 691296
  hardirqs last  enabled at (691296): [<ffffffff99bb034b>] _raw_spin_unlock_irqrestore+0x4b/0x60
  hardirqs last disabled at (691295): [<ffffffff99bb0ad2>] _raw_spin_lock_irqsave+0x22/0x81
  softirqs last  enabled at (691284): [<ffffffff97ff0c63>] irq_enter+0xc3/0xe0
  softirqs last disabled at (691285): [<ffffffff97ff0ebe>] irq_exit+0x23e/0x2b0

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(hugetlb_lock);
    <Interrupt>
      lock(hugetlb_lock);

   *** DEADLOCK ***
      :
  Call Trace:
   <IRQ>
   __lock_acquire+0x146b/0x48c0
   lock_acquire+0x14f/0x3b0
   _raw_spin_lock+0x30/0x70
   free_huge_page+0x36f/0xaa0
   bio_check_pages_dirty+0x2fc/0x5c0
   clone_endio+0x17f/0x670 [dm_mod]
   blk_update_request+0x276/0xe50
   scsi_end_request+0x7b/0x6a0
   scsi_io_completion+0x1c6/0x1570
   blk_done_softirq+0x22e/0x350
   __do_softirq+0x23d/0xad8
   irq_exit+0x23e/0x2b0
   do_IRQ+0x11a/0x200
   common_interrupt+0xf/0xf
   </IRQ>

Both the hugetbl_lock and the subpool lock can be acquired in
free_huge_page().  One way to solve the problem is to make both locks
irq-safe.  However, Mike Kravetz had learned that the hugetlb_lock is
held for a linear scan of ALL hugetlb pages during a cgroup reparentling
operation.  So it is just too long to have irq disabled unless we can
break hugetbl_lock down into finer-grained locks with shorter lock hold
times.

Another alternative is to defer the freeing to a workqueue job.  This
patch implements the deferred freeing by adding a free_hpage_workfn()
work function to do the actual freeing.  The free_huge_page() call in a
non-task context saves the page to be freed in the hpage_freelist linked
list in a lockless manner using the llist APIs.

The generic workqueue is used to process the work, but a dedicated
workqueue can be used instead if it is desirable to have the huge page
freed ASAP.

Thanks to Kirill Tkhai <[email protected]> for suggesting the use of
llist APIs which simplfy the code.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Waiman Long <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/gup: fix memory leak in __gup_benchmark_ioctl

In the implementation of __gup_benchmark_ioctl() the allocated pages
should be released before returning in case of an invalid cmd. Release
pages via kvfree().

[[email protected]: rework code flow, return -EINVAL rather than -1]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 714a3a1ebafe ("mm/gup_benchmark.c: add additional pinning methods")
Signed-off-by: Navid Emamdoost <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/oom: fix pgtables units mismatch in Killed process message

pr_err() expects kB, but mm_pgtables_bytes() returns the number of bytes.
As everything else is printed in kB, I chose to fix the value rather than
the string.

Before:

[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
...
[   1878]  1000  1878   217253   151144  1269760        0             0 python
...
Out of memory: Killed process 1878 (python) total-vm:869012kB, anon-rss:604572kB, file-rss:4kB, shmem-rss:0kB, UID:1000 pgtables:1269760kB oom_score_adj:0

After:

[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
...
[   1436]  1000  1436   217253   151890  1294336        0             0 python
...
Out of memory: Killed process 1436 (python) total-vm:869012kB, anon-rss:607516kB, file-rss:44kB, shmem-rss:0kB, UID:1000 pgtables:1264kB oom_score_adj:0

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 70cb6d267790 ("mm/oom: add oom_score_adj and pgtables to Killed process message")
Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Edward Chron <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/posix_acl.c: fix kernel-doc warnings

Fix kernel-doc warnings in fs/posix_acl.c.
Also fix one typo (setgit -> setgid).

  fs/posix_acl.c:647: warning: Function parameter or member 'inode' not described in 'posix_acl_update_mode'
  fs/posix_acl.c:647: warning: Function parameter or member 'mode_p' not described in 'posix_acl_update_mode'
  fs/posix_acl.c:647: warning: Function parameter or member 'acl' not described in 'posix_acl_update_mode'

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 073931017b49d ("posix_acl: Clear SGID bit when setting file permissions")
Signed-off-by: Randy Dunlap <[email protected]>
Acked-by: Andreas Gruenbacher <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hexagon: work around compiler crash

Clang cannot translate the string "r30" into a valid register yet.

Link: https://github.com/ClangBuiltLinux/linux/issues/755
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nick Desaulniers <[email protected]>
Suggested-by: Sid Manning <[email protected]>
Reviewed-by: Brian Cain <[email protected]>
Cc: Allison Randal <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Richard Fontana <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hexagon: parenthesize registers in asm predicates

Hexagon requires that register predicates in assembly be parenthesized.

Link: https://github.com/ClangBuiltLinux/linux/issues/754
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nick Desaulniers <[email protected]>
Suggested-by: Sid Manning <[email protected]>
Acked-by: Brian Cain <[email protected]>
Cc: Lee Jones <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Tuowen Zhao <[email protected]>
Cc: Mika Westerberg <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Alexios Zavras <[email protected]>
Cc: Allison Randal <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Richard Fontana <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/namespace.c: make to_mnt_ns() static

Make to_mnt_ns() static to address the following 'sparse' warning:

fs/namespace.c:1731:22: warning: symbol 'to_mnt_ns' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric Biggers <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/nsfs.c: include headers for missing declarations

Include linux/proc_fs.h and fs/internal.h to address the following
'sparse' warnings:

fs/nsfs.c:41:32: warning: symbol 'ns_dentry_operations' was not declared. Should it be static?
fs/nsfs.c:145:5: warning: symbol 'open_related_ns' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric Biggers <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/direct-io.c: include fs/internal.h for missing prototype

Include fs/internal.h to address the following 'sparse' warning:

fs/direct-io.c:591:5: warning: symbol 'sb_init_dio_done_wq' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric Biggers <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: move_pages: return valid node id in status if the page is already on the target node

Felix Abecassis reports move_pages() would return random status if the
pages are already on the target node by the below test program:

  int main(void)
  {
const long node_id = 1;
const long page_size = sysconf(_SC_PAGESIZE);
const int64_t num_pages = 8;

unsigned long nodemask =  1 << node_id;
long ret = set_mempolicy(MPOL_BIND, &nodemask, sizeof(nodemask));
if (ret < 0)
return (EXIT_FAILURE);

void **pages = malloc(sizeof(void*) * num_pages);
for (int i = 0; i < num_pages; ++i) {
pages[i] = mmap(NULL, page_size, PROT_WRITE | PROT_READ,
MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS,
-1, 0);
if (pages[i] == MAP_FAILED)
return (EXIT_FAILURE);
}

ret = set_mempolicy(MPOL_DEFAULT, NULL, 0);
if (ret < 0)
return (EXIT_FAILURE);

int *nodes = malloc(sizeof(int) * num_pages);
int *status = malloc(sizeof(int) * num_pages);
for (int i = 0; i < num_pages; ++i) {
nodes[i] = node_id;
status[i] = 0xd0; /* simulate garbage values */
}

ret = move_pages(0, num_pages, pages, nodes, status, MPOL_MF_MOVE);
printf("move_pages: %ld\n", ret);
for (int i = 0; i < num_pages; ++i)
printf("status[%d] = %d\n", i, status[i]);
  }

Then running the program would return nonsense status values:

  $ ./move_pages_bug
  move_pages: 0
  status[0] = 208
  status[1] = 208
  status[2] = 208
  status[3] = 208
  status[4] = 208
  status[5] = 208
  status[6] = 208
  status[7] = 208

This is because the status is not set if the page is already on the
target node, but move_pages() should return valid status as long as it
succeeds.  The valid status may be errno or node id.

We can't simply initialize status array to zero since the pages may be
not on node 0.  Fix it by updating status with node id which the page is
already on.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
Signed-off-by: Yang Shi <[email protected]>
Reported-by: Felix Abecassis <[email protected]>
Tested-by: Felix Abecassis <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: <[email protected]> [4.17+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memcg: account security cred as well to kmemcg

The cred_jar kmem_cache is already memcg accounted in the current kernel
but cred->security is not.  Account cred->security to kmemcg.

Recently we saw high root slab usage on our production and on further
inspection, we found a buggy application leaking processes.  Though that
buggy application was contained within its memcg but we observe much
more system memory overhead, couple of GiBs, during that period.  This
overhead can adversely impact the isolation on the system.

One source of high overhead we found was cred->security objects, which
have a lifetime of at least the life of the process which allocated
them.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Shakeel Butt <[email protected]>
Acked-by: Chris Down <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kcov: fix struct layout for kcov_remote_arg

Make the layout of kcov_remote_arg the same for 32-bit and 64-bit code.
This makes it more convenient to write userspace apps that can be
compiled into 32-bit or 64-bit binaries and still work with the same
64-bit kernel.

Also use proper __u32 types in uapi headers instead of unsigned ints.

Link: http://lkml.kernel.org/r/9e91020876029cfefc9211ff747685eba9536426.1575638983.git.andreyknvl@google.com
Fixes: eec028c9386ed1a ("kcov: remote coverage support")
Signed-off-by: Andrey Konovalov <[email protected]>
Acked-by: Marco Elver <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: Felipe Balbi <[email protected]>
Cc: Chunfeng Yun <[email protected]>
Cc: "Jacky . Cao @ sony . com" <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Marco Elver <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/zsmalloc.c: fix the migrated zspage statistics.

When zspage is migrated to the other zone, the zone page state should be
updated as well, otherwise the NR_ZSPAGE for each zone shows wrong
counts including proc/zoneinfo in practice.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 91537fee0013 ("mm: add NR_ZSMALLOC to vmstat")
Signed-off-by: Chanho Min <[email protected]>
Signed-off-by: Jinsuk Choi <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: <[email protected]> [4.9+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memory_hotplug: shrink zones when offlining memory

We currently try to shrink a single zone when removing memory.  We use
the zone of the first page of the memory we are removing.  If that
memmap was never initialized (e.g., memory was never onlined), we will
read garbage and can trigger kernel BUGs (due to a stale pointer):

    BUG: unable to handle page fault for address: 000000000000353d
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 0 P4D 0
    Oops: 0002 [#1] SMP PTI
    CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
    Workqueue: kacpi_hotplug acpi_hotplug_work_fn
    RIP: 0010:clear_zone_contiguous+0x5/0x10
    Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
    RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
    RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
    RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
    R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
    FS:  0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     __remove_pages+0x4b/0x640
     arch_remove_memory+0x63/0x8d
     try_remove_memory+0xdb/0x130
     __remove_memory+0xa/0x11
     acpi_memory_device_remove+0x70/0x100
     acpi_bus_trim+0x55/0x90
     acpi_device_hotplug+0x227/0x3a0
     acpi_hotplug_work_fn+0x1a/0x30
     process_one_work+0x221/0x550
     worker_thread+0x50/0x3b0
     kthread+0x105/0x140
     ret_from_fork+0x3a/0x50
    Modules linked in:
    CR2: 000000000000353d

Instead, shrink the zones when offlining memory or when onlining failed.
Introduce and use remove_pfn_range_from_zone(() for that.  We now
properly shrink the zones, even if we have DIMMs whereby

- Some memory blocks fall into no zone (never onlined)

- Some memory blocks fall into multiple zones (offlined+re-onlined)

- Multiple memory blocks that fall into different zones

Drop the zone parameter (with a potential dubious value) from
__remove_pages() and __remove_section().

Link: http://lkml.kernel.org/r/[email protected]
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: <[email protected]> [5.0+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'dmaengine-fix-5.5-rc5' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
"A bunch of fixes for:

   - uninitialized dma_slave_caps access

   - virt-dma use after free in vchan_complete()

   - driver fixes for ioat, k3dma and jz4780"

* tag 'dmaengine-fix-5.5-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
  ioat: ioat_alloc_ring() failure handling.
  dmaengine: virt-dma: Fix access after free in vchan_complete()
  dmaengine: k3dma: Avoid null pointer traversal
  dmaengine: dma-jz4780: Also break descriptor chains on JZ4725B
  dmaengine: Fix access to uninitialized dma_slave_caps

Merge tag 'media/v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

- some fixes at CEC core to comply with HDMI 2.0 specs and fix some
   border cases

- a fix at the transmission logic of the pulse8-cec driver

- one alignment fix on a data struct at ipu3 when built with 32 bits

* tag 'media/v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: intel-ipu3: Align struct ipu3_uapi_awb_fr_config_s to 32 bytes
  media: pulse8-cec: fix lost cec_transmit_attempt_done() call
  media: cec: check 'transmit_in_progress', not 'transmitting'
  media: cec: avoid decrementing transmit_queue_sz if it is 0
  media: cec: CEC 2.0-only bcast messages were ignored

ALSA: usb-audio: Apply the sample rate quirk for Bose Companion 5

Bose Companion 5 (with USB ID 05a7:1020) doesn't seem supporting
reading back the sample rate, so the existing quirk is needed.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206063
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: hda/realtek - Add new codec supported for ALCS1200A

Add ALCS1200A supported.
It was similar as ALC900.

Signed-off-by: Kailang Yang <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

rtc: cmos: Revert "rtc: Fix the AltCentury value on AMD/Hygon platform"

There are multiple reports of this patch breaking RTC time setting for AMD
platforms.

This reverts commit 7ad295d5196a58c22abecef62dd4f99e2f86e831.

Cc: Jinke Fan <[email protected]>
Link: https://lore.kernel.org/r/CABXGCsMLob0DC25JS8wwAYydnDoHBSoMh2_YLPfqm3TTvDE-Zw@mail.gmail.com
Fixes: 7ad295d5196a ("rtc: Fix the AltCentury value on AMD/Hygon platform")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexandre Belloni <[email protected]>

Merge tag 'gpio-fixes-for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into fixes

gpio fixes for v5.5-rc5

- fix coding style issues introduced in gpio-mockup during merge window