Git Repo - linux.git/log

aio: simplify - and fix - fget/fput for io_submit()

Al Viro root-caused a race where the IOCB_CMD_POLL handling of
fget/fput() could cause us to access the file pointer after it had
already been freed:

"In more details - normally IOCB_CMD_POLL handling looks so:

   1) io_submit(2) allocates aio_kiocb instance and passes it to
      aio_poll()

   2) aio_poll() resolves the descriptor to struct file by req->file =
      fget(iocb->aio_fildes)

   3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
      aio_kiocb to 2 (bumps by 1, that is).

   4) aio_poll() calls vfs_poll(). After sanity checks (basically,
      "poll_wait() had been called and only once") it locks the queue.
      That's what the extra reference to iocb had been for - we know we
      can safely access it.

   5) With queue locked, we check if ->woken has already been set to
      true (by aio_poll_wake()) and, if it had been, we unlock the
      queue, drop a reference to aio_kiocb and bugger off - at that
      point it's a responsibility to aio_poll_wake() and the stuff
      called/scheduled by it. That code will drop the reference to file
      in req->file, along with the other reference to our aio_kiocb.

   6) otherwise, we see whether we need to wait. If we do, we unlock the
      queue, drop one reference to aio_kiocb and go away - eventual
      wakeup (or cancel) will deal with the reference to file and with
      the other reference to aio_kiocb

   7) otherwise we remove ourselves from waitqueue (still under the
      queue lock), so that wakeup won't get us. No async activity will
      be happening, so we can safely drop req->file and iocb ourselves.

  If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
  won't get freed under us, so we can do all the checks and locking
  safely. And we don't touch ->file if we detect that case.

  However, vfs_poll() most certainly *does* touch the file it had been
  given. So wakeup coming while we are still in ->poll() might end up
  doing fput() on that file. That case is not too rare, and usually we
  are saved by the still present reference from descriptor table - that
  fput() is not the final one.

  But if another thread closes that descriptor right after our fget()
  and wakeup does happen before ->poll() returns, we are in trouble -
  final fput() done while we are in the middle of a method:

Al also wrote a patch to take an extra reference to the file descriptor
to fix this, but I instead suggested we just streamline the whole file
pointer handling by submit_io() so that the generic aio submission code
simply keeps the file pointer around until the aio has completed.

Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Acked-by: Al Viro <[email protected]>
Reported-by: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

cxgb4/chtls: Prefix adapter flags with CXGB4

Some of these macros were conflicting with global namespace,
hence prefixing them with CXGB4.

Signed-off-by: Arjun Vynipadath <[email protected]>
Signed-off-by: Vishal Kulkarni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-sysfs: Switch to bitmap_zalloc()

Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mellanox: Switch to bitmap_zalloc()

Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-03-04

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add AF_XDP support to libbpf. Rationale is to facilitate writing
   AF_XDP applications by offering higher-level APIs that hide many
   of the details of the AF_XDP uapi. Sample programs are converted
   over to this new interface as well, from Magnus.

2) Introduce a new cant_sleep() macro for annotation of functions
   that cannot sleep and use it in BPF_PROG_RUN() to assert that
   BPF programs run under preemption disabled context, from Peter.

3) Introduce per BPF prog stats in order to monitor the usage
   of BPF; this is controlled by kernel.bpf_stats_enabled sysctl
   knob where monitoring tools can make use of this to efficiently
   determine the average cost of programs, from Alexei.

4) Split up BPF selftest's test_progs similarly as we already
   did with test_verifier. This allows to further reduce merge
   conflicts in future and to get more structure into our
   quickly growing BPF selftest suite, from Stanislav.

5) Fix a bug in BTF's dedup algorithm which can cause an infinite
   loop in some circumstances; also various BPF doc fixes and
   improvements, from Andrii.

6) Various BPF sample cleanups and migration to libbpf in order
   to further isolate the old sample loader code (so we can get
   rid of it at some point), from Jakub.

7) Add a new BPF helper for BPF cgroup skb progs that allows
   to set ECN CE code point and a Host Bandwidth Manager (HBM)
   sample program for limiting the bandwidth used by v2 cgroups,
   from Lawrence.

8) Enable write access to skb->queue_mapping from tc BPF egress
   programs in order to let BPF pick TX queue, from Jesper.

9) Fix a bug in BPF spinlock handling for map-in-map which did
   not propagate spin_lock_off to the meta map, from Yonghong.

10) Fix a bug in the new per-CPU BPF prog counters to properly
    initialize stats for each CPU, from Eric.

11) Add various BPF helper prototypes to selftest's bpf_helpers.h,
    from Willem.

12) Fix various BPF samples bugs in XDP and tracing progs,
    from Toke, Daniel and Yonghong.

13) Silence preemption splat in test_bpf after BPF_PROG_RUN()
    enforces it now everywhere, from Anders.

14) Fix a signedness bug in libbpf's btf_dedup_ref_type() to
    get error handling working, from Dan.

15) Fix bpftool documentation and auto-completion with regards
    to stream_{verdict,parser} attach types, from Alban.
====================

Signed-off-by: David S. Miller <[email protected]>

x86-64: add warning for non-canonical user access address dereferences

This adds a warning (once) for any kernel dereference that has a user
exception handler, but accesses a non-canonical address.  It basically
is a simpler - and more limited - version of commit 9da3f2b74054
("x86/fault: BUG() when uaccess helpers fault on kernel addresses") that
got reverted.

Note that unlike that original commit, this only causes a warning,
because there are real situations where we currently can do this
(notably speculative argument fetching for uprobes etc).  Also, unlike
that original commit, this _only_ triggers for #GP accesses, so the
cases of valid kernel pointers that cross into a non-mapped page aren't
affected.

The intent of this is two-fold:

- the uprobe/tracing accesses really do need to be more careful. In
   particular, from a portability standpoint it's just wrong to think
   that "a pointer is a pointer", and use the same logic for any random
   pointer value you find on the stack. It may _work_ on x86-64, but it
   doesn't necessarily work on other architectures (where the same
   pointer value can be either a kernel pointer _or_ a user pointer, and
   you really need to be much more careful in how you try to access it)

   The warning can hopefully end up being a reminder that just any
   random pointer access won't do.

- Kees in particular wanted a way to actually report invalid uses of
   wild pointers to user space accessors, instead of just silently
   failing them. Automated fuzzers want a way to get reports if the
   kernel ever uses invalid values that the fuzzer fed it.

   The non-canonical address range is a fair chunk of the address space,
   and with this you can teach syzkaller to feed in invalid pointer
   values and find cases where we do not properly validate user
   addresses (possibly due to bad uses of "set_fs()").

Acked-by: Kees Cook <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: Introduce test_stackinit module

Adds test for stack initialization coverage. We have several build options
that control the level of stack variable initialization. This test lets us
visualize which options cover which cases, and provide tests for some of
the pathological padding conditions the compiler will sometimes fail to
initialize.

All options pass the explicit initialization cases and the partial
initializers (even with padding):

test_stackinit: u8_zero ok
test_stackinit: u16_zero ok
test_stackinit: u32_zero ok
test_stackinit: u64_zero ok
test_stackinit: char_array_zero ok
test_stackinit: small_hole_zero ok
test_stackinit: big_hole_zero ok
test_stackinit: trailing_hole_zero ok
test_stackinit: packed_zero ok
test_stackinit: small_hole_dynamic_partial ok
test_stackinit: big_hole_dynamic_partial ok
test_stackinit: trailing_hole_dynamic_partial ok
test_stackinit: packed_dynamic_partial ok
test_stackinit: small_hole_static_partial ok
test_stackinit: big_hole_static_partial ok
test_stackinit: trailing_hole_static_partial ok
test_stackinit: packed_static_partial ok
test_stackinit: packed_static_all ok
test_stackinit: packed_dynamic_all ok
test_stackinit: packed_runtime_all ok

The results of the other tests (which contain no explicit initialization),
change based on the build's configured compiler instrumentation.

No options:

test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
test_stackinit: small_hole_runtime_partial FAIL (uninit bytes: 23)
test_stackinit: big_hole_runtime_partial FAIL (uninit bytes: 127)
test_stackinit: trailing_hole_runtime_partial FAIL (uninit bytes: 24)
test_stackinit: packed_runtime_partial FAIL (uninit bytes: 24)
test_stackinit: small_hole_runtime_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_runtime_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_runtime_all FAIL (uninit bytes: 7)
test_stackinit: u8_none FAIL (uninit bytes: 1)
test_stackinit: u16_none FAIL (uninit bytes: 2)
test_stackinit: u32_none FAIL (uninit bytes: 4)
test_stackinit: u64_none FAIL (uninit bytes: 8)
test_stackinit: char_array_none FAIL (uninit bytes: 16)
test_stackinit: switch_1_none FAIL (uninit bytes: 8)
test_stackinit: switch_2_none FAIL (uninit bytes: 8)
test_stackinit: small_hole_none FAIL (uninit bytes: 24)
test_stackinit: big_hole_none FAIL (uninit bytes: 128)
test_stackinit: trailing_hole_none FAIL (uninit bytes: 32)
test_stackinit: packed_none FAIL (uninit bytes: 32)
test_stackinit: user FAIL (uninit bytes: 32)
test_stackinit: failures: 25

CONFIG_GCC_PLUGIN_STRUCTLEAK_USER=y
This only tries to initialize structs with __user markings, so
only the difference from above is now the "user" test passes:

test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
test_stackinit: small_hole_runtime_partial FAIL (uninit bytes: 23)
test_stackinit: big_hole_runtime_partial FAIL (uninit bytes: 127)
test_stackinit: trailing_hole_runtime_partial FAIL (uninit bytes: 24)
test_stackinit: packed_runtime_partial FAIL (uninit bytes: 24)
test_stackinit: small_hole_runtime_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_runtime_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_runtime_all FAIL (uninit bytes: 7)
test_stackinit: u8_none FAIL (uninit bytes: 1)
test_stackinit: u16_none FAIL (uninit bytes: 2)
test_stackinit: u32_none FAIL (uninit bytes: 4)
test_stackinit: u64_none FAIL (uninit bytes: 8)
test_stackinit: char_array_none FAIL (uninit bytes: 16)
test_stackinit: switch_1_none FAIL (uninit bytes: 8)
test_stackinit: switch_2_none FAIL (uninit bytes: 8)
test_stackinit: small_hole_none FAIL (uninit bytes: 24)
test_stackinit: big_hole_none FAIL (uninit bytes: 128)
test_stackinit: trailing_hole_none FAIL (uninit bytes: 32)
test_stackinit: packed_none FAIL (uninit bytes: 32)
test_stackinit: user ok
test_stackinit: failures: 24

CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF=y
This initializes all structures passed by reference (scalars and strings
remain uninitialized):

test_stackinit: small_hole_static_all ok
test_stackinit: big_hole_static_all ok
test_stackinit: trailing_hole_static_all ok
test_stackinit: small_hole_dynamic_all ok
test_stackinit: big_hole_dynamic_all ok
test_stackinit: trailing_hole_dynamic_all ok
test_stackinit: small_hole_runtime_partial ok
test_stackinit: big_hole_runtime_partial ok
test_stackinit: trailing_hole_runtime_partial ok
test_stackinit: packed_runtime_partial ok
test_stackinit: small_hole_runtime_all ok
test_stackinit: big_hole_runtime_all ok
test_stackinit: trailing_hole_runtime_all ok
test_stackinit: u8_none FAIL (uninit bytes: 1)
test_stackinit: u16_none FAIL (uninit bytes: 2)
test_stackinit: u32_none FAIL (uninit bytes: 4)
test_stackinit: u64_none FAIL (uninit bytes: 8)
test_stackinit: char_array_none FAIL (uninit bytes: 16)
test_stackinit: switch_1_none FAIL (uninit bytes: 8)
test_stackinit: switch_2_none FAIL (uninit bytes: 8)
test_stackinit: small_hole_none ok
test_stackinit: big_hole_none ok
test_stackinit: trailing_hole_none ok
test_stackinit: packed_none ok
test_stackinit: user ok
test_stackinit: failures: 7

CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y
This initializes all variables, so it matches above with the scalars
and arrays included:

test_stackinit: small_hole_static_all ok
test_stackinit: big_hole_static_all ok
test_stackinit: trailing_hole_static_all ok
test_stackinit: small_hole_dynamic_all ok
test_stackinit: big_hole_dynamic_all ok
test_stackinit: trailing_hole_dynamic_all ok
test_stackinit: small_hole_runtime_partial ok
test_stackinit: big_hole_runtime_partial ok
test_stackinit: trailing_hole_runtime_partial ok
test_stackinit: packed_runtime_partial ok
test_stackinit: small_hole_runtime_all ok
test_stackinit: big_hole_runtime_all ok
test_stackinit: trailing_hole_runtime_all ok
test_stackinit: u8_none ok
test_stackinit: u16_none ok
test_stackinit: u32_none ok
test_stackinit: u64_none ok
test_stackinit: char_array_none ok
test_stackinit: switch_1_none ok
test_stackinit: switch_2_none ok
test_stackinit: small_hole_none ok
test_stackinit: big_hole_none ok
test_stackinit: trailing_hole_none ok
test_stackinit: packed_none ok
test_stackinit: user ok
test_stackinit: all tests passed!

Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>

gcc-plugins: structleak: Generalize to all variable types

This adjusts structleak to also work with non-struct types when they
are passed by reference, since those variables may leak just like
anything else. This is exposed via an improved set of Kconfig options.
(This does mean structleak is slightly misnamed now.)

Building with CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL should give the
kernel complete initialization coverage of all stack variables passed
by reference, including padding (see lib/test_stackinit.c).

Using CONFIG_GCC_PLUGIN_STRUCTLEAK_VERBOSE to count added initializations
under defconfig:

..._BYREF:      5945 added initializations
..._BYREF_ALL: 16606 added initializations

There is virtually no change to text+data size (both have less than 0.05%
growth):

   text    data     bss     dec     hex filename
19502103        5051456 1917000 26470559        193e89f vmlinux.stock
19513412        5051456 1908808 26473676        193f4cc vmlinux.byref
19516974        5047360 1900616 26464950        193d2b6 vmlinux.byref_all

The measured performance difference is in the noise for hackbench and
kernel build benchmarks:

Stock:

5x hackbench -g 20 -l 1000
Mean:   10.649s
Std Dev: 0.339

5x kernel build (4-way parallel)
Mean:  261.98s
Std Dev: 1.53

CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF:

5x hackbench -g 20 -l 1000
Mean:   10.540s
Std Dev: 0.233

5x kernel build (4-way parallel)
Mean:  260.52s
Std Dev: 1.31

CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL:

5x hackbench -g 20 -l 1000
Mean:   10.320
Std Dev: 0.413

5x kernel build (4-way parallel)
Mean:  260.10
Std Dev: 0.86

This does not yet solve missing padding initialization for structures
on the stack that are never passed by reference (which should be a tiny
minority). Hopefully this will be more easily addressed by upstream
compiler fixes after clarifying the C11 padding initialization
specification.

Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>

printk/docs: Add extra integer types to printk-formats

A few commonly used integer types were absent from this table, so add
them.

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Suggested-by: Nick Desaulniers <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Louis Taylor <[email protected]>
Signed-off-by: Louis Taylor <[email protected]>
[[email protected]: sorted both variants the same way by size]
Signed-off-by: Petr Mladek <[email protected]>

Merge branch 'spi-5.1' into spi-next

Merge branch 'spi-5.0' into spi-linus

Merge branch 'regulator-5.1' into regulator-next

Merge branch 'regulator-5.0' into regulator-linus

printk: Remove no longer used LOG_PREFIX.

When commit 5becfb1df5ac8e49 ("kmsg: merge continuation records while
printing") introduced LOG_PREFIX, we used KERN_DEFAULT etc. as a flag
for setting LOG_PREFIX in order to tell whether to call cont_add()
(i.e. whether to append the message to "struct cont").

But since commit 4bcc595ccd80decb ("printk: reinstate KERN_CONT for
printing continuation lines") inverted the behavior (i.e. don't append
the message to "struct cont" unless KERN_CONT is specified) and commit
5aa068ea4082b39e ("printk: remove games with previous record flags")
removed the last LOG_PREFIX check, setting LOG_PREFIX via KERN_DEFAULT
etc. is no longer meaningful.

Therefore, we can remove LOG_PREFIX and make KERN_DEFAULT empty string.

Link: http://lkml.kernel.org/r/1550829580-9189-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
To: Steven Rostedt <[email protected]>
To: Linus Torvalds <[email protected]>
Cc: [email protected]
Cc: Tetsuo Handa <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>

Merge branch 'pm-opp'

* pm-opp:
  cpufreq: OMAP: Register an Energy Model
  cpufreq: imx6q: Register an Energy Model
  opp: no need to check return value of debugfs_create functions
  cpufreq: mediatek: Register an Energy Model
  cpufreq: scmi: Register an Energy Model
  cpufreq: arm_big_little: Register an Energy Model
  cpufreq: scpi: Register an Energy Model
  cpufreq: dt: Register an Energy Model

Merge branch 'pm-cpufreq'

* pm-cpufreq: (48 commits)
  cpufreq: kryo: Release OPP tables on module removal
  cpufreq: ap806: add missing of_node_put after of_device_is_available
  cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies
  cpufreq: Pass updated policy to driver ->setpolicy() callback
  cpufreq: Fix two debug messages in cpufreq_set_policy()
  cpufreq: Reorder and simplify cpufreq_update_policy()
  cpufreq: Add kerneldoc comments for two core functions
  cpufreq: intel_pstate: Rework iowait boosting to be less aggressive
  cpufreq: intel_pstate: Eliminate intel_pstate_get_base_pstate()
  cpufreq: intel_pstate: Avoid redundant initialization of local vars
  cpufreq / cppc: Work around for Hisilicon CPPC cpufreq
  ACPI / CPPC: Add a helper to get desired performance
  cpufreq: davinci: move configuration to include/linux/platform_data
  cpufreq: speedstep: convert BUG() to BUG_ON()
  cpufreq: powernv: fix missing check of return value in init_powernv_pstates()
  cpufreq: longhaul: remove unneeded semicolon
  cpufreq: pcc-cpufreq: remove unneeded semicolon
  cpufreq: Replace double NOT (!!) with single NOT (!)
  cpufreq: intel_pstate: Add reasons for failure and debug messages
  cpufreq: dt: Implement online/offline() callbacks
  ...

Merge branches 'pm-cpuidle' and 'powercap'

* pm-cpuidle:
  ACPI / processor: Set P_LVL{2,3} idle state descriptions
  intel_idle: add support for Jacobsville
  cpuidle: dt: bail out if the idle-state DT node is not compatible
  cpuidle: use BIT() for idle state flags and remove CPUIDLE_DRIVER_FLAGS_MASK
  Documentation: driver-api: PM: Add cpuidle document
  cpuidle: New timer events oriented governor for tickless systems

* powercap:
  powercap/intel_rapl: add Ice Lake mobile
  powercap: intel_rapl: add support for Jacobsville

Merge branches 'pm-core', 'pm-sleep', 'pm-qos', 'pm-domains' and 'pm-em'

* pm-core:
  PM / core: Add support to skip power management in device/driver model
  PM / suspend: Print debug messages for device using direct-complete
  PM-runtime: update time accounting only when enabled
  PM-runtime: Switch accounting over to ktime_get_mono_fast_ns()
  PM-runtime: Optimize pm_runtime_autosuspend_expiration()
  PM-runtime: Replace jiffies-based accounting with ktime-based accounting
  PM-runtime: update accounting_timestamp on enable
  PM: clock_ops: fix missing clk_prepare() return value check
  drm/i915: Move on the new pm runtime interface
  PM-runtime: Add new interface to get accounted time

* pm-sleep:
  PM / wakeup: fix kerneldoc comment for pm_wakeup_dev_event()

* pm-qos:
  PM: QoS: no need to check return value of debugfs_create functions

* pm-domains:
  PM / Domains: Mark "name" const in dev_pm_domain_attach_by_name()
  PM / Domains: Mark "name" const in genpd_dev_pm_attach_by_name()
  PM: domains: no need to check return value of debugfs_create functions

* pm-em:
  PM / EM: Expose the Energy Model in debugfs

Merge branches 'acpi-video' and 'acpi-x86'

* acpi-video:
  ACPI / video: Extend chassis-type detection with a "Lunch Box" check
  ACPI / video: Refactor and fix dmi_is_desktop()

* acpi-x86:
  ACPI / x86: Make PWM2 device always present at Lenovo Yoga Book

Merge branch 'acpi-apei'

* acpi-apei: (29 commits)
  efi: cper: Fix possible out-of-bounds access
  ACPI: APEI: Fix possible out-of-bounds access to BERT region
  MAINTAINERS: Add James Morse to the list of APEI reviewers
  ACPI / APEI: Add support for the SDEI GHES Notification type
  firmware: arm_sdei: Add ACPI GHES registration helper
  ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications
  ACPI / APEI: Only use queued estatus entry during in_nmi_queue_one_entry()
  ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length
  ACPI / APEI: Make GHES estatus header validation more user friendly
  ACPI / APEI: Pass ghes and estatus separately to avoid a later copy
  ACPI / APEI: Let the notification helper specify the fixmap slot
  ACPI / APEI: Move locking to the notification helper
  arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
  KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
  ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
  ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI
  ACPI / APEI: Don't allow ghes_ack_error() to mask earlier errors
  ACPI / APEI: Generalise the estatus queue's notify code
  ACPI / APEI: Don't update struct ghes' flags in read/clear estatus
  ACPI / APEI: Remove spurious GHES_TO_CLEAR check
  ...

Merge branches 'acpi-tables', 'acpi-debug', 'acpi-ec' and 'acpi-dptf'

* acpi-tables:
  ACPI/PPTT: Add acpi_pptt_warn_missing() to consolidate logs
  ACPI / tables: table override from built-in initrd

* acpi-debug:
  ACPI: debug: Clean up acpi_aml_init()
  ACPI: no need to check return value of debugfs_create functions

* acpi-ec:
  Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk"
  ACPI: EC: Simplify boot EC checks in acpi_ec_add()
  ACPI: EC: Eliminate acpi_config_boot_ec()
  ACPI: EC: Make acpi_ec_dsdt_probe() more straightforward
  ACPI: EC: Make acpi_ec_ecdt_probe() more straightforward
  ACPI: EC: Declare boot_ec as static
  ACPI: EC: Clean up probing for early EC

* acpi-dptf:
  ACPI / DPTF: remove header search path to the parent directory

Merge branch 'acpica'

* acpica:
  ACPICA: Update version to 20190215
  ACPI/ACPICA: Trivial: fix spelling mistakes and fix whitespace formatting
  ACPICA: ACPI 6.3: add GTDT Revision 3 support
  ACPICA: ACPI 6.3: HMAT updates
  ACPICA: ACPI 6.3: PPTT add additional fields in Processor Structure Flags
  ACPICA: ACPI 6.3: add Error Disconnect Recover Notification value
  ACPICA: ACPI 6.3: MADT: add support for statistical profiling in GICC
  ACPICA: ACPI 6.3: add PCC operation region support for AML interpreter
  ACPICA: ACPI 6.3: SRAT: add Generic Affinity Structure subtable
  ACPICA: ACPI 6.3: Add Trigger order to PCC Identifier structure in PDTT
  ACPICA: ACPI 6.3: Adding predefined methods _NBS, _NCH, _NIC, _NIH, and _NIG
  ACPICA: Update/clarify messages for control method failures
  ACPICA: Debugger: Fix possible fault with the "test objects" command
  ACPICA: Interpreter: Emit warning for creation of a zero-length op region
  ACPICA: Remove legacy module-level code support
  ACPICA: Get rid of acpi_sleep_dispatch()
  ACPICA: Update version to 20190108
  ACPICA: All acpica: Update copyrights to 2019
  ACPICA: acpiexec: Add option to dump extra info for memory leaks
  ACPICA: Convert more ACPI errors to firmware errors

bpf: add test cases for non-pointer sanitiation logic

Add two additional tests for further asserting the
BPF_ALU_NON_POINTER logic with cases that were missed
previously.

Cc: Marek Majkowski <[email protected]>
Cc: Arthur Fabre <[email protected]>
Acked-by: Song Liu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

Merge branch 'mlxsw-minimal-Add-ethtool-and-resource-query-support'

Ido Schimmel says:

====================
mlxsw: minimal: Add ethtool and resource query support

Vadim says:

The minimal driver is chip independent and uses I2C bus for chip access.
Its purpose is to support chassis management on systems equipped with
Mellanox switch ASICs. For example, from a BMC (Board Management
Controller) device.

Patches #1-#3 add ethtool support to the minimal driver so that QSFP/SFP
module info could be retrieved by the driver. This is done by exposing a
dummy netdev for each front panel port and implementing the required
ethtool operations.

Patches #4-#8 add resource query support. This allows the driver to
query the firmware about values of certain resources (e.g., maximum
number of ports). It is required on systems where the maximum number of
ports is larger than the hard coded default (64).
====================

Signed-off-by: David S. Miller <[email protected]>

mlxsw: i2c: Extend initialization by querying resources data

Extend initialization flow by query requests for chip resources data in
order to obtain chip's specific capabilities, like the number of ports.

Signed-off-by: Vadim Pasternak <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: i2c: Extend input parameters list of command API

Extend input parameters list of command API in mlxsw_i2c_cmd() in order
to support initialization commands. Up until now, only access commands
were supported by I2C driver.

Signed-off-by: Vadim Pasternak <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: i2c: Modify input parameter name in initialization API

Change input parameter name "resource" to "res" in mlxsw_i2c_init() in
order to align it with mlxsw_pci_init().

Signed-off-by: Vadim Pasternak <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: i2c: Fix comment misspelling

Fix comment for mlxsw_i2c_write_cmd().

Signed-off-by: Vadim Pasternak <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: core: Move resource query API to common location

Move mlxsw_pci_resources_query() to a common location to allow reuse by
the different drivers and over all the supported physical buses. Rename
it to mlxsw_core_resources_query().

Signed-off-by: Vadim Pasternak <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: minimal: Add ethtool support

The minimal driver is chip independent and uses I2C bus for chip access.
Its purpose is to support chassis management on systems equipped with
Mellanox switch ASICs. For example from BMC (Board Management
Controller) device.

Expose a dummy netdev for each front panel port and implement basic
ethtool operations to obtain QSFP/SFP module info through ethtool.

Signed-off-by: Vadim Pasternak <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: minimal: Make structures and variables names shorter

Replace "mlxsw_minimal" by "mlxsw_m" in order to improve code
readability.

Signed-off-by: Vadim Pasternak <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: core: Move ethtool module callbacks to a common location

Move the implementation of ethtool module callbacks - .get_module_info()
and .get_module_eeprom() - to a common location to allow reuse by the
different mlxsw drivers.

Signed-off-by: Vadim Pasternak <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'tls-Fix-issues-in-tls_device'

Boris Pismenny says:

====================
tls: Fix issues in tls_device

This series fixes issues encountered in tls_device code paths,
which were introduced recently.

Additionally, this series includes a fix for tls software only receive flow,
which causes corruption of payload received by user space applications.

This series was tested using the OpenSSL integration of KTLS -
https://github.com/mellan
====================

Signed-off-by: David S. Miller <[email protected]>

tls: Fix tls_device receive

Currently, the receive function fails to handle records already
decrypted by the device due to the commit mentioned below.

This commit advances the TLS record sequence number and prepares the context
to handle the next record.

Fixes: fedf201e1296 ("net: tls: Refactor control message handling on recv")
Signed-off-by: Boris Pismenny <[email protected]>
Reviewed-by: Eran Ben Elisha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: Fix mixing between async capable and async

Today, tls_sw_recvmsg is capable of using asynchronous mode to handle
application data TLS records. Moreover, it assumes that if the cipher
can be handled asynchronously, then all packets will be processed
asynchronously.

However, this assumption is not always true. Specifically, for AES-GCM
in TLS1.2, it causes data corruption, and breaks user applications.

This patch fixes this problem by separating the async capability from
the decryption operation result.

Fixes: c0ab4732d4c6 ("net/tls: Do not use async crypto for non-data records")
Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: Fix write space handling

TLS device cannot use the sw context. This patch returns the original
tls device write space handler and moves the sw/device specific portions
to the relevant files.

Also, we remove the write_space call for the tls_sw flow, because it
handles partial records in its delayed tx work handler.

Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Boris Pismenny <[email protected]>
Reviewed-by: Eran Ben Elisha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: Fix tls_device handling of partial records

Cleanup the handling of partial records while fixing a bug where the
tls_push_pending_closed_record function is using the software tls
context instead of the hardware context.

The bug resulted in the following crash:
[   88.791229] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[   88.793271] #PF error: [normal kernel read fault]
[   88.794449] PGD 800000022a426067 P4D 800000022a426067 PUD 22a156067 PMD 0
[   88.795958] Oops: 0000 [#1] SMP PTI
[   88.796884] CPU: 2 PID: 4973 Comm: openssl Not tainted 5.0.0-rc4+ #3
[   88.798314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[   88.800067] RIP: 0010:tls_tx_records+0xef/0x1d0 [tls]
[   88.801256] Code: 00 02 48 89 43 08 e8 a0 0b 96 d9 48 89 df e8 48 dd
4d d9 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
[   88.805179] RSP: 0018:ffffbd888186fca8 EFLAGS: 00010213
[   88.806458] RAX: ffff9af1ed657c98 RBX: ffff9af1e88a1980 RCX: 0000000000000000
[   88.808050] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9af1e88a1980
[   88.809724] RBP: ffff9af1e88a1980 R08: 0000000000000017 R09: ffff9af1ebeeb700
[   88.811294] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   88.812917] R13: ffff9af1e88a1980 R14: ffff9af1ec13f800 R15: 0000000000000000
[   88.814506] FS:  00007fcad2240740(0000) GS:ffff9af1f7880000(0000) knlGS:0000000000000000
[   88.816337] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   88.817717] CR2: 0000000000000000 CR3: 0000000228b3e000 CR4: 00000000001406e0
[   88.819328] Call Trace:
[   88.820123]  tls_push_data+0x628/0x6a0 [tls]
[   88.821283]  ? remove_wait_queue+0x20/0x60
[   88.822383]  ? n_tty_read+0x683/0x910
[   88.823363]  tls_device_sendmsg+0x53/0xa0 [tls]
[   88.824505]  sock_sendmsg+0x36/0x50
[   88.825492]  sock_write_iter+0x87/0x100
[   88.826521]  __vfs_write+0x127/0x1b0
[   88.827499]  vfs_write+0xad/0x1b0
[   88.828454]  ksys_write+0x52/0xc0
[   88.829378]  do_syscall_64+0x5b/0x180
[   88.830369]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   88.831603] RIP: 0033:0x7fcad1451680

[ 1248.470626] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 1248.472564] #PF error: [normal kernel read fault]
[ 1248.473790] PGD 0 P4D 0
[ 1248.474642] Oops: 0000 [#1] SMP PTI
[ 1248.475651] CPU: 3 PID: 7197 Comm: openssl Tainted: G           OE 5.0.0-rc4+ #3
[ 1248.477426] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1248.479310] RIP: 0010:tls_tx_records+0x110/0x1f0 [tls]
[ 1248.480644] Code: 00 02 48 89 43 08 e8 4f cb 63 d7 48 89 df e8 f7 9c
1b d7 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
[ 1248.484825] RSP: 0018:ffffaa0a41543c08 EFLAGS: 00010213
[ 1248.486154] RAX: ffff955a2755dc98 RBX: ffff955a36031980 RCX: 0000000000000006
[ 1248.487855] RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000286
[ 1248.489524] RBP: ffff955a36031980 R08: 0000000000000000 R09: 00000000000002b1
[ 1248.491394] R10: 0000000000000003 R11: 00000000ad55ad55 R12: 0000000000000000
[ 1248.493162] R13: 0000000000000000 R14: ffff955a2abe6c00 R15: 0000000000000000
[ 1248.494923] FS:  0000000000000000(0000) GS:ffff955a378c0000(0000) knlGS:0000000000000000
[ 1248.496847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1248.498357] CR2: 0000000000000000 CR3: 000000020c40e000 CR4: 00000000001406e0
[ 1248.500136] Call Trace:
[ 1248.500998]  ? tcp_check_oom+0xd0/0xd0
[ 1248.502106]  tls_sk_proto_close+0x127/0x1e0 [tls]
[ 1248.503411]  inet_release+0x3c/0x60
[ 1248.504530]  __sock_release+0x3d/0xb0
[ 1248.505611]  sock_close+0x11/0x20
[ 1248.506612]  __fput+0xb4/0x220
[ 1248.507559]  task_work_run+0x88/0xa0
[ 1248.508617]  do_exit+0x2cb/0xbc0
[ 1248.509597]  ? core_sys_select+0x17a/0x280
[ 1248.510740]  do_group_exit+0x39/0xb0
[ 1248.511789]  get_signal+0x1d0/0x630
[ 1248.512823]  do_signal+0x36/0x620
[ 1248.513822]  exit_to_usermode_loop+0x5c/0xc6
[ 1248.515003]  do_syscall_64+0x157/0x180
[ 1248.516094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1248.517456] RIP: 0033:0x7fb398bd3f53
[ 1248.518537] Code: Bad RIP value.

Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Boris Pismenny <[email protected]>
Signed-off-by: Eran Ben Elisha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-phy-clean-up-the-old-gen10g-functions'

Heiner Kallweit says:

====================
net: phy: clean up the old gen10g functions

The old gen10g_ functions are mainly stubs and have been superseded
by genphy_c45_ equivalents. So lets remove / hide the old functions
as far as possible.
====================

Signed-off-by: David S. Miller <[email protected]>

net: phy: remove gen10g_no_soft_reset

genphy_no_soft_reset and gen10g_no_soft_reset are both the same no-ops,
one is enough.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: don't export gen10g_read_status

gen10g_read_status is deprecated, therefore stop exporting it.
We don't want to encourage anybody to use it.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: remove gen10g_config_init

ETHTOOL_LINK_MODE_10000baseT_Full_BIT is set anyway in the supported
and advertising bitmap because it's part of PHY_10GBIT_FEATURES.
And all users of gen10g_config_init use PHY_10GBIT_FEATURES.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: remove gen10g_suspend and gen10g_resume

phy_suspend() and phy_resume() are no-ops anyway if no callback is
defined. Therefore we don't need these stubs.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: use genphy_c45_aneg_done in genphy_aneg_done

Now that we have it let's use genphy_c45_aneg_done() in phy_aneg_done().

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qmi_wwan: Add support for Quectel EG12/EM12

Quectel EG12 (module)/EM12 (M.2 card) is a Cat. 12 LTE modem. The modem
behaves in the same way as the EP06, so the "set DTR"-quirk must be
applied and the diagnostic-interface check performed. Since the
diagnostic-check now applies to more modems, I have renamed the function
from quectel_ep06_diag_detected() to quectel_diag_detected().

Signed-off-by: Kristian Evensen <[email protected]>
Acked-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv8e6xxx: fix number of internal PHYs for 88E6x90 family

Ports 9 and 10 don't have internal PHY's but are (dependent on the
version) SERDES/SGMII/XAUI/RXAUI ports.

v2:
- fix it for all 88E6x90 family members

Fixes: bc3931557d1d ("net: dsa: mv88e6xxx: Add number of internal PHYs")
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-sysfs: Fix mem leak in netdev_register_kobject

syzkaller report this:
BUG: memory leak
unreferenced object 0xffff88837a71a500 (size 256):
  comm "syz-executor.2", pid 9770, jiffies 4297825125 (age 17.843s)
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff 20 c0 ef 86 ff ff ff ff  ........ .......
  backtrace:
    [<00000000db12624b>] netdev_register_kobject+0x124/0x2e0 net/core/net-sysfs.c:1751
    [<00000000dc49a994>] register_netdevice+0xcc1/0x1270 net/core/dev.c:8516
    [<00000000e5f3fea0>] tun_set_iff drivers/net/tun.c:2649 [inline]
    [<00000000e5f3fea0>] __tun_chr_ioctl+0x2218/0x3d20 drivers/net/tun.c:2883
    [<000000001b8ac127>] vfs_ioctl fs/ioctl.c:46 [inline]
    [<000000001b8ac127>] do_vfs_ioctl+0x1a5/0x10e0 fs/ioctl.c:690
    [<0000000079b269f8>] ksys_ioctl+0x89/0xa0 fs/ioctl.c:705
    [<00000000de649beb>] __do_sys_ioctl fs/ioctl.c:712 [inline]
    [<00000000de649beb>] __se_sys_ioctl fs/ioctl.c:710 [inline]
    [<00000000de649beb>] __x64_sys_ioctl+0x74/0xb0 fs/ioctl.c:710
    [<000000007ebded1e>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
    [<00000000db315d36>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<00000000115be9bb>] 0xffffffffffffffff

It should call kset_unregister to free 'dev->queues_kset'
in error path of register_queue_kobjects, otherwise will cause a mem leak.

Reported-by: Hulk Robot <[email protected]>
Fixes: 1d24eb4815d1 ("xps: Transmit Packet Steering")
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fsl/fman: Use vsprintf extension %pM

Make logging of an ethernet address more consistent with
the rest of the kernel.

Miscellanea:

The %02hx use also did not quite match the u8 definition
of addr though that did not actually matter given normal
integer promotion rules.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipv6: add socket option IPV6_ROUTER_ALERT_ISOLATE

By default IPv6 socket with IPV6_ROUTER_ALERT socket option set will
receive all IPv6 RA packets from all namespaces.
IPV6_ROUTER_ALERT_ISOLATE socket option restricts packets received by
the socket to be only from the socket's namespace.

Signed-off-by: Maxim Martynov <[email protected]>
Signed-off-by: Francesco Ruggeri <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: handle unknown duplex modes gracefully in mv88e6xxx_port_set_duplex

When testing another issue I faced the problem that
mv88e6xxx_port_setup_mac() failed due to DUPLEX_UNKNOWN being passed
as argument to mv88e6xxx_port_set_duplex(). We should handle this case
gracefully and return -EOPNOTSUPP, like e.g. mv88e6xxx_port_set_speed()
is doing it.

Fixes: 7f1ae07b51e8 ("net: dsa: mv88e6xxx: add port duplex setter")
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fixup address-space warnings in compat_mc_{get,set}sockopt()

Add __user attributes in some of the casts in this function to avoid
the following sparse warnings:

net/compat.c:592:57: warning: cast removes address space of expression
net/compat.c:592:57: warning: incorrect type in initializer (different address spaces)
net/compat.c:592:57:    expected struct compat_group_req [noderef] <asn:1>*gr32
net/compat.c:592:57:    got void *<noident>
net/compat.c:613:65: warning: cast removes address space of expression
net/compat.c:613:65: warning: incorrect type in initializer (different address spaces)
net/compat.c:613:65:    expected struct compat_group_source_req [noderef] <asn:1>*gsr32
net/compat.c:613:65:    got void *<noident>
net/compat.c:634:60: warning: cast removes address space of expression
net/compat.c:634:60: warning: incorrect type in initializer (different address spaces)
net/compat.c:634:60:    expected struct compat_group_filter [noderef] <asn:1>*gf32
net/compat.c:634:60:    got void *<noident>
net/compat.c:672:52: warning: cast removes address space of expression
net/compat.c:672:52: warning: incorrect type in initializer (different address spaces)
net/compat.c:672:52:    expected struct compat_group_filter [noderef] <asn:1>*gf32
net/compat.c:672:52:    got void *<noident>

Signed-off-by: Ben Dooks <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: Use prepare/commit phase in dsa_slave_vlan_rx_add_vid()

We were skipping the prepare phase which causes some problems with at
least a couple of drivers:

- mv88e6xxx chooses to skip programming VID = 0 with -EOPNOTSUPP in
  the prepare phase, but we would still try to force this VID since we
  would only call the commit phase and so we would get the driver to
  return -EINVAL instead

- qca8k does not currently have a port_vlan_add() callback implemented,
  yet we would try to call that unconditionally leading to a NPD

Fix both issues by conforming to the current model doing a
prepare/commit phase, this makes us consistent throughout the code and
assumptions.

Reported-by: Heiner Kallweit <[email protected]>
Reported-by: Michal Vokáč <[email protected]>
Fixes: 061f6a505ac3 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'dpaa2-eth-add-XDP_REDIRECT-support'

Ioana Ciornei says:

====================
dpaa2-eth: add XDP_REDIRECT support

The first patch adds different software annotation types for Tx frames
depending on frame type while the second one actually adds support for basic
XDP_REDIRECT.

Changes in v2:
- add missing xdp_do_flush_map() call
====================

Signed-off-by: David S. Miller <[email protected]>

dpaa2-eth: add XDP_REDIRECT support

Implement support for the XDP_REDIRECT action.

The redirected frame is transmitted and confirmed on the regular Tx/Tx
conf queues. Frame is marked with the "XDP" type in the software
annotation, since it requires special treatment.

We don't have good hardware support for TX batching, so the
XDP_XMIT_FLUSH flag doesn't make a difference for now; ndo_xdp_xmit
performs the actual Tx operation on the spot.

Signed-off-by: Ioana Ciornei <[email protected]>
Signed-off-by: Ioana Radulescu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dpaa2-eth: Add software annotation types

We write different metadata information in the software annotation
area of Tx frames, depending on frame type. Make this more explicit
by introducing a type field and separate structures for single buffer
and scatter-gather frames.

Signed-off-by: Ioana Radulescu <[email protected]>
Signed-off-by: Ioana Ciornei <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'sched-Patches-from-out-of-tree-version-of-sch_cake'

Toke Høiland-Jørgensen says:

====================
sched: Patches from out-of-tree version of sch_cake

This series includes a couple of patches with updates from the out-of-tree
version of sch_cake. The first one is a fix to the fairness scheduling when
dual-mode fairness is enabled. The second patch is an additional feature flag
that allows using fwmark as a tin selector, as a convenience for people who want
to customise tin selection. The third patch is just a cleanup to the tin
selection logic.
====================

Signed-off-by: David S. Miller <[email protected]>

sch_cake: Simplify logic in cake_select_tin()

With more modes added the logic in cake_select_tin() was getting a bit
hairy, and it turns out we can actually simplify it quite a bit. This also
allows us to get rid of one of the two diffserv parsing functions, which
has the added benefit that already-zeroed DSCP fields won't get re-written.

Suggested-by: Kevin Darbyshire-Bryant <[email protected]>
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sch_cake: Permit use of connmarks as tin classifiers

Add flag 'FWMARK' to enable use of firewall connmarks as tin selector.
The connmark (skbuff->mark) needs to be in the range 1->tin_cnt ie.
for diffserv3 the mark needs to be 1->3.

Background

Typically CAKE uses DSCP as the basis for tin selection.  DSCP values
are relatively easily changed as part of the egress path, usually with
iptables & the mangle table, ingress is more challenging.  CAKE is often
used on the WAN interface of a residential gateway where passthrough of
DSCP from the ISP is either missing or set to unhelpful values thus use
of ingress DSCP values for tin selection isn't helpful in that
environment.

An approach to solving the ingress tin selection problem is to use
CAKE's understanding of tc filters.  Naive tc filters could match on
source/destination port numbers and force tin selection that way, but
multiple filters don't scale particularly well as each filter must be
traversed whether it matches or not. e.g. a simple example to map 3
firewall marks to tins:

MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' )
tc filter add dev $DEV parent $MAJOR protocol all handle 0x01 fw action skbedit priority ${MAJOR}1
tc filter add dev $DEV parent $MAJOR protocol all handle 0x02 fw action skbedit priority ${MAJOR}2
tc filter add dev $DEV parent $MAJOR protocol all handle 0x03 fw action skbedit priority ${MAJOR}3

Another option is to use eBPF cls_act with tc filters e.g.

MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' )
tc filter add dev $DEV parent $MAJOR bpf da obj my-bpf-fwmark-to-class.o

This has the disadvantages of a) needing someone to write & maintain
the bpf program, b) a bpf toolchain to compile it and c) needing to
hardcode the major number in the bpf program so it matches the cake
instance (or forcing the cake instance to a particular major number)
since the major number cannot be passed to the bpf program via tc
command line.

As already hinted at by the previous examples, it would be helpful
to associate tins with something that survives the Internet path and
ideally allows tin selection on both egress and ingress.  Netfilter's
conntrack permits setting an identifying mark on a connection which
can also be restored to an ingress packet with tc action connmark e.g.

tc filter add dev eth0 parent ffff: protocol all prio 10 u32 \
match u32 0 0 flowid 1:1 action connmark action mirred egress redirect dev ifb1

Since tc's connmark action has restored any connmark into skb->mark,
any of the previous solutions are based upon it and in one form or
another copy that mark to the skb->priority field where again CAKE
picks this up.

This change cuts out at least one of the (less intuitive &
non-scalable) middlemen and permit direct access to skb->mark.

Signed-off-by: Kevin Darbyshire-Bryant <[email protected]>
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sch_cake: Make the dual modes fairer

CAKE host fairness does not work well with TCP flows in dual-srchost and
dual-dsthost setup. The reason is that ACKs generated by TCP flows are
classified as sparse flows, and affect flow isolation from other hosts. Fix
this by calculating host_load based only on the bulk flows a host
generates. In a hash collision the host_bulk_flow_count values must be
decremented on the old hosts and incremented on the new ones *if* the queue
is in the bulk set.

Reported-by: Pete Heist <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge v5.0 into drm-next

There is a really hairy resolution involving amdgpu fixes, that I'd rather confirm here.

Also some misc fixes are landed by me, but the pr has them as well.

Signed-off-by: Dave Airlie <[email protected]>

spi: sh-msiof: Restrict bits per word to 8/16/24/32 on R-Car Gen2/3

While the MSIOF variants in older SuperH and SH/R-Mobile SoCs support
bits-per-word values in the full range 8..32, the variants present in
R-Car Gen2 and Gen3 SoCs are restricted to 8, 16, 24, or 32.

Obtain the value from family-specific sh_msiof_chipdata to fix this.

Reported-by: Yoshihiro Shimoda <[email protected]>
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

regulator: mc13xxx: Constify regulator_ops variables

These regulator_ops variables should never change, make them const.

Signed-off-by: Axel Lin <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

regulator: palmas: Constify palmas_smps_ramp_delay array

The palmas_smps_ramp_delay array should never modify, make it const.

Signed-off-by: Axel Lin <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

regulator: wm831x-dcdc: Convert to use regulator_set/get_current_limit_regmap

Use regulator_set/get_current_limit_regmap helpers to save some code.

Signed-off-by: Axel Lin <[email protected]>
Acked-by: Charles Keepax <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

regulator: pv88090: Convert to use regulator_set/get_current_limit_regmap

Use regulator_set/get_current_limit_regmap helpers to save some code.

Signed-off-by: Axel Lin <[email protected]>
Acked-by: Steve Twiss <[email protected]>;
Signed-off-by: Mark Brown <[email protected]>

regulator: pv88080: Convert to use regulator_set/get_current_limit_regmap

Use regulator_set/get_current_limit_regmap helpers to save some code.

Signed-off-by: Axel Lin <[email protected]>
Acked-by: Steve Twiss <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

regulator: pv88060: Convert to use regulator_set/get_current_limit_regmap

Use regulator_set/get_current_limit_regmap helpers to save some code.

Signed-off-by: Axel Lin <[email protected]>
Acked-by: Steve Twiss <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

Merge branch 'for-5.0' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into regulator-5.1

regulator: max77650: Convert to use regulator_set/get_current_limit_regmap

Use regulator_set/get_current_limit_regmap helpers to save some code.

Signed-off-by: Axel Lin <[email protected]>
Acked-by: Bartosz Golaszewski <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

regulator: lp873x: Convert to use regulator_set/get_current_limit_regmap

Use regulator_set/get_current_limit_regmap helpers to save some code.

Signed-off-by: Axel Lin <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

regulator: lp872x: Convert to use regulator_set/get_current_limit_regmap

Use regulator_set/get_current_limit_regmap helpers to save some code.

Signed-off-by: Axel Lin <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

regulator: da9210: Convert to use regulator_set/get_current_limit_regmap

Use regulator_set/get_current_limit_regmap helpers to save some code.

Signed-off-by: Axel Lin <[email protected]>
Acked-by: Steve Twiss <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

regulator: da9055: Convert to use regulator_set/get_current_limit_regmap

Use regulator_set/get_current_limit_regmap helpers to save some code.

Signed-off-by: Axel Lin <[email protected]>
Acked-by: Steve Twiss <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

regulator: core: Add set/get_current_limit helpers for regmap users

By setting curr_table, n_current_limits, csel_reg and csel_mask, the
regmap users can use regulator_set_current_limit_regmap and
regulator_get_current_limit_regmap for set/get_current_limit callbacks.

Signed-off-by: Axel Lin <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

regulator: Fix comment for csel_reg and csel_mask

The csel_reg and csel_mask fields in struct regulator_desc needs to
be generic for drivers. Not just for TPS65218.

Signed-off-by: Axel Lin <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

Linux 5.0

rtc: tx4939: use .set_time

Switch from .set_mmss to .set_time as the former is deprecated.

Signed-off-by: Alexandre Belloni <[email protected]>

rtc: tx4939: switch to rtc_time64_to_tm/rtc_tm_to_time64

Call the 64bit versions of rtc_time_to_tm now that the range is enforced by
the core.

Signed-off-by: Alexandre Belloni <[email protected]>

rtc: tx4939: set range

The TX4939 RTC is a 48bit counter that counts two on every clock edge of
32.768 KHz oscillator clock so it counts 32bit seconds.

Signed-off-by: Alexandre Belloni <[email protected]>

rtc: tx4939: remove useless test

The tested condition will never happen as the core always passes a fully
set struct tm (using rtc_ktime_to_tm) to the .set_alarm callback.

Signed-off-by: Alexandre Belloni <[email protected]>

Merge branch 'Macb-power-management-support-for-ZynqMP'

Harini Katakam says:

====================
Macb power management support for ZynqMP

This series adds support for macb suspend/resume with system power down.
In relation to the above, this series also updates mdio_read/write
function for PM and adds tsu clock management.
====================

Signed-off-by: David S. Miller <[email protected]>

net: macb: Add support for suspend/resume with full power down

When macb device is suspended and system is powered down, the clocks
are removed and hence macb should be closed gracefully and restored
upon resume. This patch does the same by switching off the net device,
suspending phy and performing necessary cleanup of interrupts and BDs.
Upon resume, all these are reinitialized again.

Reset of macb device is done only when GEM is not a wake device.
Even when gem is a wake device, tx queues can be stopped and ptp device
can be closed (tsu clock will be disabled in pm_runtime_suspend) as
wake event detection has no dependency on this.

Signed-off-by: Kedareswara rao Appana <[email protected]>
Signed-off-by: Harini Katakam <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: macb: Add pm runtime support

Add runtime pm functions and move clock handling there.
Add runtime PM calls to mdio functions to allow for active mdio bus.

Signed-off-by: Shubhrajyoti Datta <[email protected]>
Signed-off-by: Harini Katakam <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: macb: Support clock management for tsu_clk

TSU clock needs to be enabled/disabled as per support in devicetree
and it should also be controlled during suspend/resume (WOL has no
dependency on this clock).

Signed-off-by: Harini Katakam <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: macb: Check MDIO state before read/write and use timeouts

Replace the while loop in MDIO read/write functions with a timeout.
In addition, add a check for MDIO bus busy before initiating a new
operation as well to make sure there is no ongoing MDIO operation.

Signed-off-by: Shubhrajyoti Datta <[email protected]>
Signed-off-by: Sai Pavan Boddu <[email protected]>
Signed-off-by: Harini Katakam <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-dsa-microchip-add-KSZ9893-switch-support'

Tristram Ha says:

====================
net: dsa: microchip: add KSZ9893 switch support

This series of patches is to modify the KSZ9477 DSA driver to support
running KSZ9893 switch.

The KSZ9893 switch is similar to KSZ9477 except the ingress tail tag has
1 byte instead of 2 bytes. The XMII register that governs the MAC
communication also has different register definitions.

v1
- Put KSZ9893 tagging in separate patch
- Remove other switch support
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: microchip: add KSZ9893 switch support

Add KSZ9893 switch support in KSZ9477 driver. This switch is similar to
KSZ9477 except the ingress tail tag has 1 byte instead of 2 bytes, so
KSZ9893 tagging will be used.

The XMII register that governs how the host port communicates with the
MAC also has different register definitions.

Signed-off-by: Tristram Ha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: add KSZ9893 switch tagging support

KSZ9893 switch is similar to KSZ9477 switch except the ingress tail tag
has 1 byte instead of 2 bytes. The size of the portmap is smaller and
so the override and lookup bits are also moved.

Signed-off-by: Tristram Ha <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: dsa: document additional Microchip KSZ9477 family switches

Document additional Microchip KSZ9477 family switches.

Show how KSZ8565 switch should be configured as the host port is port 7
instead of port 5.

Signed-off-by: Tristram Ha <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rtc: zynqmp: let the core handle range

Let the core handle the RTC range instead of open coding it.

Signed-off-by: Alexandre Belloni <[email protected]>

rtc: zynqmp: fix possible race condition

The IRQ is requested before the struct rtc is allocated and registered, but
this struct is used in the IRQ handler. This may lead to a NULL pointer
dereference.

Switch to devm_rtc_allocate_device/rtc_register_device to allocate the rtc
struct before requesting the IRQ.

Signed-off-by: Alexandre Belloni <[email protected]>

rtc: imx-sc: use rtc_time64_to_tm

The imx-sc driver properly sets range_max, use rtc_time64_to_tm() instead
of the deprecated rtc_time_to_tm()

Signed-off-by: Alexandre Belloni <[email protected]>

Merge branch 'appletalk-small-cleanup-and-bugfix'

Yue Haibing says:

====================
appletalk: small cleanup and bugfix

v2:
- Add cover letter log

This patch series mainly fix a use-after-free bug in atalk_proc_exit.
patch 1 use remove_proc_subtree helper to simplify atalk_proc fs code,
also some other cleanup.
patch 2 add proper error cleanup path in atalk_init to fix the issue, which
based on the patch 1 because of the change of atalk_proc_exit context.
====================

Signed-off-by: David S. Miller <[email protected]>

appletalk: Fix use-after-free in atalk_proc_exit

KASAN report this:

BUG: KASAN: use-after-free in pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
Read of size 8 at addr ffff8881f41fe5b0 by task syz-executor.0/2806

CPU: 0 PID: 2806 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xfa/0x1ce lib/dump_stack.c:113
print_address_description+0x65/0x270 mm/kasan/report.c:187
kasan_report+0x149/0x18d mm/kasan/report.c:317
pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
remove_proc_entry+0xe8/0x420 fs/proc/generic.c:667
atalk_proc_exit+0x18/0x820 [appletalk]
atalk_exit+0xf/0x5a [appletalk]
__do_sys_delete_module kernel/module.c:1018 [inline]
__se_sys_delete_module kernel/module.c:961 [inline]
__x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961
do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb2de6b9c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200001c0
RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb2de6ba6bc
R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff

Allocated by task 2806:
set_track mm/kasan/common.c:85 [inline]
__kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496
slab_post_alloc_hook mm/slab.h:444 [inline]
slab_alloc_node mm/slub.c:2739 [inline]
slab_alloc mm/slub.c:2747 [inline]
kmem_cache_alloc+0xcf/0x250 mm/slub.c:2752
kmem_cache_zalloc include/linux/slab.h:730 [inline]
__proc_create+0x30f/0xa20 fs/proc/generic.c:408
proc_mkdir_data+0x47/0x190 fs/proc/generic.c:469
0xffffffffc10c01bb
0xffffffffc10c0166
do_one_initcall+0xfa/0x5ca init/main.c:887
do_init_module+0x204/0x5f6 kernel/module.c:3460
load_module+0x66b2/0x8570 kernel/module.c:3808
__do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2806:
set_track mm/kasan/common.c:85 [inline]
__kasan_slab_free+0x130/0x180 mm/kasan/common.c:458
slab_free_hook mm/slub.c:1409 [inline]
slab_free_freelist_hook mm/slub.c:1436 [inline]
slab_free mm/slub.c:2986 [inline]
kmem_cache_free+0xa6/0x2a0 mm/slub.c:3002
pde_put+0x6e/0x80 fs/proc/generic.c:647
remove_proc_entry+0x1d3/0x420 fs/proc/generic.c:684
0xffffffffc10c031c
0xffffffffc10c0166
do_one_initcall+0xfa/0x5ca init/main.c:887
do_init_module+0x204/0x5f6 kernel/module.c:3460
load_module+0x66b2/0x8570 kernel/module.c:3808
__do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8881f41fe500
which belongs to the cache proc_dir_entry of size 256
The buggy address is located 176 bytes inside of
256-byte region [ffff8881f41fe500, ffff8881f41fe600)
The buggy address belongs to the page:
page:ffffea0007d07f80 count:1 mapcount:0 mapping:ffff8881f6e69a00 index:0x0
flags: 0x2fffc0000000200(slab)
raw: 02fffc0000000200 dead000000000100 dead000000000200 ffff8881f6e69a00
raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8881f41fe480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8881f41fe500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8881f41fe580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8881f41fe600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8881f41fe680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

It should check the return value of atalk_proc_init fails,
otherwise atalk_exit will trgger use-after-free in pde_subdir_find
while unload the module.This patch fix error cleanup path of atalk_init

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

appletalk: use remove_proc_subtree to simplify procfs code

Use remove_proc_subtree to remove the whole subtree
on cleanup.Also do some cleanup.

Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mscc: Enable all ports in QSGMII

When Ocelot phy-mode is QSGMII, all 4 ports involved in
QSGMII shall be kept out of reset and
Tx lanes shall be enabled to pass the data.

Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Kavya Sree Kotagiri <[email protected]>
Signed-off-by: Steen Hegelund <[email protected]>
Co-developed-by: Steen Hegelund <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

powerpc/32: Clear on-stack exception marker upon exception return

Clear the on-stack STACK_FRAME_REGS_MARKER on exception exit in order
to avoid confusing stacktrace like the one below.

  Call Trace:
  [c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable)
  [c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180
  [c0e9dd10] [c0895130] memchr+0x24/0x74
  [c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574
  [c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8
  [c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4
  --- interrupt: c0e9df00 at 0x400f330
      LR = init_stack+0x1f00/0x2000
  [c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable)
  [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
  [c0e9df50] [c0c15434] start_kernel+0x310/0x488
  [c0e9dff0] [00003484] 0x3484

With this patch the trace becomes:

  Call Trace:
  [c0e9dca0] [c01c42c0] print_address_description+0x64/0x2bc (unreliable)
  [c0e9dcd0] [c01c46a4] kasan_report+0xfc/0x180
  [c0e9dd10] [c0895150] memchr+0x24/0x74
  [c0e9dd30] [c00a9e58] msg_print_text+0x124/0x574
  [c0e9dde0] [c00ab730] console_unlock+0x114/0x4f8
  [c0e9de40] [c00adc80] vprintk_emit+0x188/0x1c4
  [c0e9de80] [c00ae3e4] printk+0xa8/0xcc
  [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
  [c0e9df50] [c0c15434] start_kernel+0x310/0x488
  [c0e9dff0] [00003484] 0x3484

Cc: [email protected]
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

staging: mt7621-dma: remove license boilerplate text

Remove license boilerplate text.

Signed-off-by: Jules Irenge <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

staging: mt7621-dma: add SPDX GPL-2.0+ license identifier

Add SPDX GPL-2.0+ license to fix missing SPDX warning
reported by checkpatch.pl.

Signed-off-by: Jules Irenge <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"One more set of simple ARM platform fixes:

   - A boot regression on qualcomm msm8998

   - Gemini display controllers got turned off by accident

   - incorrect reference counting in optee"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  tee: optee: add missing of_node_put after of_device_is_available
  arm64: dts: qcom: msm8998: Extend TZ reserved memory area
  ARM: dts: gemini: Re-enable display controller

net: sched: put back q.qlen into a single location

In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
John made the assumption that the data path had no need to read
the qdisc qlen (number of packets in the qdisc).

It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
children.

But pfifo_fast can be used as leaf in class full qdiscs, and existing
logic needs to access the child qlen in an efficient way.

HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
  htb_activate() -> WARN_ON()
  htb_dequeue_tree() to decide if a class can be htb_deactivated
  when it has no more packets.

HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
qdisc_tree_reduce_backlog() also read q.qlen directly.

Using qdisc_qlen_sum() (which iterates over all possible cpus)
in the data path is a non starter.

It seems we have to put back qlen in a central location,
at least for stable kernels.

For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
so the existing q.qlen{++|--} are correct.

For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
because the spinlock might be not held (for example from
pfifo_fast_enqueue() and pfifo_fast_dequeue())

This patch adds atomic_qlen (in the same location than qlen)
and renames the following helpers, since we want to express
they can be used without qdisc lock, and that qlen is no longer percpu.

- qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
- qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()

Later (net-next) we might revert this patch by tracking all these
qlen uses and replace them by a more efficient method (not having
to access a precise qlen, but an empty/non_empty status that might
be less expensive to maintain/track).

Another possibility is to have a legacy pfifo_fast version that would
be used when used a a child qdisc, since the parent qdisc needs
a spinlock anyway. But then, future lockless qdiscs would also
have the same problem.

Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>