====================
bpf_link abstraction itself was formalized in [0] with justifications for why
its semantics is a good fit for attaching BPF programs of various types. This
patch set adds bpf_link-based BPF program attachment mechanism for cgroup BPF
programs.
Cgroup BPF link is semantically compatible with current BPF_F_ALLOW_MULTI
semantics of attaching cgroup BPF programs directly. Thus cgroup bpf_link can
co-exist with legacy BPF program multi-attachment.
bpf_link is destroyed and automatically detached when the last open FD holding
the reference to bpf_link is closed. This means that by default, when the
process that created bpf_link exits, attached BPF program will be
automatically detached due to bpf_link's clean up code. Cgroup bpf_link, like
any other bpf_link, can be pinned in BPF FS and by those means survive the
exit of process that created the link. This is useful in many scenarios to
provide long-living BPF program attachments. Pinning also means that there
could be many owners of bpf_link through independent FDs.
Additionally, auto-detachmet of cgroup bpf_link is implemented. When cgroup is
dying it will automatically detach all active bpf_links. This ensures that
cgroup clean up is not delayed due to active bpf_link even despite no chance
for any BPF program to be run for a given cgroup. In that sense it's similar
to existing behavior of dropping refcnt of attached bpf_prog. But in the case
of bpf_link, bpf_link is not destroyed and is still available to user as long
as at least one active FD is still open (or if it's pinned in BPF FS).
There are two main cgroup-specific differences between bpf_link-based and
direct bpf_prog-based attachment.
First, as opposed to direct bpf_prog attachment, cgroup itself doesn't "own"
bpf_link, which makes it possible to auto-clean up attached bpf_link when user
process abruptly exits without explicitly detaching BPF program. This makes
for a safe default behavior proven in BPF tracing program types. But bpf_link
doesn't bump cgroup->bpf.refcnt as well and because of that doesn't prevent
cgroup from cleaning up its BPF state.
Second, only owners of bpf_link (those who created bpf_link in the first place
or obtained a new FD by opening bpf_link from BPF FS) can detach and/or update
it. This makes sure that no other process can accidentally remove/replace BPF
program.
This patch set also implements LINK_UPDATE sub-command, which allows to
replace bpf_link's underlying bpf_prog, similarly to BPF_F_REPLACE flag
behavior for direct bpf_prog cgroup attachment. Similarly to LINK_CREATE, it
is supposed to be generic command for different types of bpf_links.
Andrii Nakryiko [Mon, 30 Mar 2020 03:00:01 +0000 (20:00 -0700)]
selftests/bpf: Test FD-based cgroup attachment
Add selftests to exercise FD-based cgroup BPF program attachments and their
intermixing with legacy cgroup BPF attachments. Auto-detachment and program
replacement (both unconditional and cmpxchng-like) are tested as well.
Andrii Nakryiko [Mon, 30 Mar 2020 03:00:00 +0000 (20:00 -0700)]
libbpf: Add support for bpf_link-based cgroup attachment
Add bpf_program__attach_cgroup(), which uses BPF_LINK_CREATE subcommand to
create an FD-based kernel bpf_link. Also add low-level bpf_link_create() API.
If expected_attach_type is not specified explicitly with
bpf_program__set_expected_attach_type(), libbpf will try to determine proper
attach type from BPF program's section definition.
Also add support for bpf_link's underlying BPF program replacement:
- unconditional through high-level bpf_link__update_program() API;
- cmpxchg-like with specifying expected current BPF program through
low-level bpf_link_update() API.
Andrii Nakryiko [Mon, 30 Mar 2020 02:59:59 +0000 (19:59 -0700)]
bpf: Implement bpf_prog replacement for an active bpf_cgroup_link
Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
under given bpf_link. Currently this is only supported for bpf_cgroup_link,
but will be extended to other kinds of bpf_links in follow-up patches.
For bpf_cgroup_link, implemented functionality matches existing semantics for
direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
unconditionally set new bpf_prog regardless of which bpf_prog is currently
active under given bpf_link, or, optionally, can specify expected active
bpf_prog. If active bpf_prog doesn't match expected one, no changes are
performed, old bpf_link stays intact and attached, operation returns
a failure.
cgroup_bpf_replace() operation is resolving race between auto-detachment and
bpf_prog update in the same fashion as it's done for bpf_link detachment,
except in this case update has no way of succeeding because of target cgroup
marked as dying. So in this case error is returned.
Andrii Nakryiko [Mon, 30 Mar 2020 02:59:58 +0000 (19:59 -0700)]
bpf: Implement bpf_link-based cgroup BPF program attachment
Implement new sub-command to attach cgroup BPF programs and return FD-based
bpf_link back on success. bpf_link, once attached to cgroup, cannot be
replaced, except by owner having its FD. Cgroup bpf_link supports only
BPF_F_ALLOW_MULTI semantics. Both link-based and prog-based BPF_F_ALLOW_MULTI
attachments can be freely intermixed.
To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
BPF program can be executed, implement auto-detachment of link. When
cgroup_bpf_release() is called, all attached bpf_links are forced to release
cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
well as still owning underlying bpf_prog. This is because user-space might
still have FDs open and active, so bpf_link as a user-referenced object can't
be freed yet. Once last active FD is closed, bpf_link will be freed and
underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
touched, because cgroup is released already.
The inherent race between bpf_cgroup_link release (from closing last FD) and
cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
the only additional check required is when bpf_cgroup_link attempts to detach
itself from cgroup. At that time we need to check whether there is still
cgroup associated with that link. And if not, exit with success, because
bpf_cgroup_link was already successfully detached.
Linus Torvalds [Tue, 31 Mar 2020 00:35:14 +0000 (17:35 -0700)]
Merge tag 'irq-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Treewide:
- Cleanup of setup_irq() which is not longer required because the
memory allocator is available early.
Most cleanup changes come through the various maintainer trees, so
the final removal of setup_irq() is postponed towards the end of
the merge window.
Core:
- Protection against unsafe invocation of interrupt handlers and
unsafe interrupt injection including a fixup of the offending
PCI/AER error injection mechanism.
Invoking interrupt handlers from arbitrary contexts, i.e. outside
of an actual interrupt, can cause inconsistent state on the
fragile x86 interrupt affinity changing hardware trainwreck.
Drivers:
- Second wave of support for the new ARM GICv4.1
- Multi-instance support for Xilinx and PLIC interrupt controllers
- CPU-Hotplug support for PLIC
- The obligatory new driver for X1000 TCU
- Enhancements, cleanups and fixes all over the place"
* tag 'irq-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
unicore32: Replace setup_irq() by request_irq()
sh: Replace setup_irq() by request_irq()
hexagon: Replace setup_irq() by request_irq()
c6x: Replace setup_irq() by request_irq()
alpha: Replace setup_irq() by request_irq()
irqchip/gic-v4.1: Eagerly vmap vPEs
irqchip/gic-v4.1: Add VSGI property setup
irqchip/gic-v4.1: Add VSGI allocation/teardown
irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer
irqchip/gic-v4.1: Plumb set_vcpu_affinity SGI callbacks
irqchip/gic-v4.1: Plumb get/set_irqchip_state SGI callbacks
irqchip/gic-v4.1: Plumb mask/unmask SGI callbacks
irqchip/gic-v4.1: Add initial SGI configuration
irqchip/gic-v4.1: Plumb skeletal VSGI irqchip
irqchip/stm32: Retrigger both in eoi and unmask callbacks
irqchip/gic-v3: Move irq_domain_update_bus_token to after checking for NULL domain
irqchip/xilinx: Do not call irq_set_default_host()
irqchip/xilinx: Enable generic irq multi handler
irqchip/xilinx: Fill error code when irq domain registration fails
irqchip/xilinx: Add support for multiple instances
...
Randy Dunlap [Sun, 29 Mar 2020 16:12:31 +0000 (09:12 -0700)]
staging/octeon: fix up merge error
There's a semantic conflict in the Octeon staging network driver, which
used the skb_reset_tc() function to reset skb state when re-using an
skb. But that inline helper function was removed in mainline by commit 2c64605b590e ("net: Fix CONFIG_NET_CLS_ACT=n and
CONFIG_NFT_FWD_NETDEV={y, m} build").
Fix it by using skb_reset_redirect() instead. Also move it out of the
This code path only ends up triggering if REUSE_SKBUFFS_WITHOUT_FREE is
enabled, which in turn only happens if you don't have CONFIG_NETFILTER
configured. Which was how this wasn't caught by the usual allmodconfig
builds.
Linus Torvalds [Tue, 31 Mar 2020 00:01:51 +0000 (17:01 -0700)]
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- Various NUMA scheduling updates: harmonize the load-balancer and
NUMA placement logic to not work against each other. The intended
result is better locality, better utilization and fewer migrations.
- Introduce Thermal Pressure tracking and optimizations, to improve
task placement on thermally overloaded systems.
- Implement frequency invariant scheduler accounting on (some) x86
CPUs. This is done by observing and sampling the 'recent' CPU
frequency average at ~tick boundaries. The CPU provides this data
via the APERF/MPERF MSRs. This hopefully makes our capacity
estimates more precise and keeps tasks on the same CPU better even
if it might seem overloaded at a lower momentary frequency. (As
usual, turbo mode is a complication that we resolve by observing
the maximum frequency and renormalizing to it.)
- Add asymmetric CPU capacity wakeup scan to improve capacity
utilization on asymmetric topologies. (big.LITTLE systems)
- Optimize the CONFIG_RT_GROUP_SCHED constraints code.
- Misc fixes, cleanups and optimizations - see the changelog for
details"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
threads: Update PID limit comment according to futex UAPI change
sched/fair: Fix condition of avg_load calculation
sched/rt: cpupri_find: Trigger a full search as fallback
kthread: Do not preempt current task if it is going to call schedule()
sched/fair: Improve spreading of utilization
sched: Avoid scale real weight down to zero
psi: Move PF_MEMSTALL out of task->flags
MAINTAINERS: Add maintenance information for psi
psi: Optimize switching tasks inside shared cgroups
psi: Fix cpu.pressure for cpu.max and competing cgroups
sched/core: Distribute tasks within affinity masks
sched/fair: Fix enqueue_task_fair warning
thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code
sched/rt: Remove unnecessary push for unfit tasks
sched/rt: Allow pulling unfitting task
sched/rt: Optimize cpupri_find() on non-heterogenous systems
sched/rt: Re-instate old behavior in select_task_rq_rt()
sched/rt: cpupri_find: Implement fallback mechanism for !fit case
sched/fair: Fix reordering of enqueue/dequeue_task_fair()
sched/fair: Fix runnable_avg for throttled cfs
...
Scott Mayhew [Tue, 3 Mar 2020 22:58:37 +0000 (17:58 -0500)]
NFS: Ensure security label is set for root inode
When using NFSv4.2, the security label for the root inode should be set
via a call to nfs_setsecurity() during the mount process, otherwise the
inode will appear as unlabeled for up to acdirmin seconds. Currently
the label for the root inode is allocated, retrieved, and freed entirely
witin nfs4_proc_get_root().
Add a field for the label to the nfs_fattr struct, and allocate & free
the label in nfs_get_root(), where we also add a call to
nfs_setsecurity(). Note that for the call to nfs_setsecurity() to
succeed, it's necessary to also move the logic calling
security_sb_{set,clone}_security() from nfs_get_tree_common() down into
nfs_get_root()... otherwise the SBLABEL_MNT flag will not be set in the
super_block's security flags and nfs_setsecurity() will silently fail.
Linus Torvalds [Mon, 30 Mar 2020 23:40:08 +0000 (16:40 -0700)]
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"The main changes in this cycle were:
Kernel side changes:
- A couple of x86/cpu cleanups and changes were grandfathered in due
to patch dependencies. These clean up the set of CPU model/family
matching macros with a consistent namespace and C99 initializer
style.
- A bunch of updates to various low level PMU drivers:
* AMD Family 19h L3 uncore PMU
* Intel Tiger Lake uncore support
* misc fixes to LBR TOS sampling
- Documentation changes and updates to core facilities
- misc cleanups, fixes and other enhancements"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits)
cpufreq/intel_pstate: Fix wrong macro conversion
x86/cpu: Cleanup the now unused CPU match macros
hwrng: via_rng: Convert to new X86 CPU match macros
crypto: Convert to new CPU match macros
ASoC: Intel: Convert to new X86 CPU match macros
powercap/intel_rapl: Convert to new X86 CPU match macros
PCI: intel-mid: Convert to new X86 CPU match macros
mmc: sdhci-acpi: Convert to new X86 CPU match macros
intel_idle: Convert to new X86 CPU match macros
extcon: axp288: Convert to new X86 CPU match macros
thermal: Convert to new X86 CPU match macros
hwmon: Convert to new X86 CPU match macros
platform/x86: Convert to new CPU match macros
EDAC: Convert to new X86 CPU match macros
cpufreq: Convert to new X86 CPU match macros
ACPI: Convert to new X86 CPU match macros
x86/platform: Convert to new CPU match macros
x86/kernel: Convert to new CPU match macros
x86/kvm: Convert to new CPU match macros
x86/perf/events: Convert to new CPU match macros
...
Linus Torvalds [Mon, 30 Mar 2020 23:17:15 +0000 (16:17 -0700)]
Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- Continued user-access cleanups in the futex code.
- percpu-rwsem rewrite that uses its own waitqueue and atomic_t
instead of an embedded rwsem. This addresses a couple of
weaknesses, but the primary motivation was complications on the -rt
kernel.
- Introduce raw lock nesting detection on lockdep
(CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
lock differences. This too originates from -rt.
- Reuse lockdep zapped chain_hlocks entries, to conserve RAM
footprint on distro-ish kernels running into the "BUG:
MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
chain-entries pool.
- Misc cleanups, smaller fixes and enhancements - see the changelog
for details"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
Documentation/locking/locktypes: Minor copy editor fixes
Documentation/locking/locktypes: Further clarifications and wordsmithing
m68knommu: Remove mm.h include from uaccess_no.h
x86: get rid of user_atomic_cmpxchg_inatomic()
generic arch_futex_atomic_op_inuser() doesn't need access_ok()
x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
objtool: whitelist __sanitizer_cov_trace_switch()
[parisc, s390, sparc64] no need for access_ok() in futex handling
sh: no need of access_ok() in arch_futex_atomic_op_inuser()
futex: arch_futex_atomic_op_inuser() calling conventions change
completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
lockdep: Add posixtimer context tracing bits
lockdep: Annotate irq_work
lockdep: Add hrtimer context tracing bits
lockdep: Introduce wait-type checks
completion: Use simple wait queues
sched/swait: Prepare usage in completions
...
Linus Torvalds [Mon, 30 Mar 2020 23:13:08 +0000 (16:13 -0700)]
Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
"The EFI changes in this cycle are much larger than usual, for two
(positive) reasons:
- The GRUB project is showing signs of life again, resulting in the
introduction of the generic Linux/UEFI boot protocol, instead of
x86 specific hacks which are increasingly difficult to maintain.
There's hope that all future extensions will now go through that
boot protocol.
- Preparatory work for RISC-V EFI support.
The main changes are:
- Boot time GDT handling changes
- Simplify handling of EFI properties table on arm64
- Generic EFI stub cleanups, to improve command line handling, file
I/O, memory allocation, etc.
- Introduce a generic initrd loading method based on calling back
into the firmware, instead of relying on the x86 EFI handover
protocol or device tree.
- Introduce a mixed mode boot method that does not rely on the x86
EFI handover protocol either, and could potentially be adopted by
other architectures (if another one ever surfaces where one
execution mode is a superset of another)
- Clean up the contents of 'struct efi', and move out everything that
doesn't need to be stored there.
- Incorporate support for UEFI spec v2.8A changes that permit
firmware implementations to return EFI_UNSUPPORTED from UEFI
runtime services at OS runtime, and expose a mask of which ones are
supported or unsupported via a configuration table.
- Partial fix for the lack of by-VA cache maintenance in the
decompressor on 32-bit ARM.
- Changes to load device firmware from EFI boot service memory
regions
- Various documentation updates and minor code cleanups and fixes"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
efi/libstub/arm: Fix spurious message that an initrd was loaded
efi/libstub/arm64: Avoid image_base value from efi_loaded_image
partitions/efi: Fix partition name parsing in GUID partition entry
efi/x86: Fix cast of image argument
efi/libstub/x86: Use ULONG_MAX as upper bound for all allocations
efi: Fix a mistype in comments mentioning efivar_entry_iter_begin()
efi/libstub: Avoid linking libstub/lib-ksyms.o into vmlinux
efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()
efi/x86: Ignore the memory attributes table on i386
efi/x86: Don't relocate the kernel unless necessary
efi/x86: Remove extra headroom for setup block
efi/x86: Add kernel preferred address to PE header
efi/x86: Decompress at start of PE image load address
x86/boot/compressed/32: Save the output address instead of recalculating it
efi/libstub/x86: Deal with exit() boot service returning
x86/boot: Use unsigned comparison for addresses
efi/x86: Avoid using code32_start
efi/x86: Make efi32_pe_entry() more readable
efi/x86: Respect 32-bit ABI in efi32_pe_entry()
efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA
...
Linus Torvalds [Mon, 30 Mar 2020 22:52:00 +0000 (15:52 -0700)]
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle were:
- Make kfree_rcu() use kfree_bulk() for added performance
- RCU updates
- Callback-overload handling updates
- Tasks-RCU KCSAN and sparse updates
- Locking torture test and RCU torture test updates
- Documentation updates
- Miscellaneous fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
rcu: Make rcu_barrier() account for offline no-CBs CPUs
rcu: Mark rcu_state.gp_seq to detect concurrent writes
Documentation/memory-barriers: Fix typos
doc: Add rcutorture scripting to torture.txt
doc/RCU/rcu: Use https instead of http if possible
doc/RCU/rcu: Use absolute paths for non-rst files
doc/RCU/rcu: Use ':ref:' for links to other docs
doc/RCU/listRCU: Update example function name
doc/RCU/listRCU: Fix typos in a example code snippets
doc/RCU/Design: Remove remaining HTML tags in ReST files
doc: Add some more RCU list patterns in the kernel
rcutorture: Set KCSAN Kconfig options to detect more data races
rcutorture: Manually clean up after rcu_barrier() failure
rcutorture: Make rcu_torture_barrier_cbs() post from corresponding CPU
rcuperf: Measure memory footprint during kfree_rcu() test
rcutorture: Annotation lockless accesses to rcu_torture_current
rcutorture: Add READ_ONCE() to rcu_torture_count and rcu_torture_batch
rcutorture: Fix stray access to rcu_fwd_cb_nodelay
rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race
rcutorture: Make kvm-find-errors.sh abort on bad directory
...
Linus Torvalds [Mon, 30 Mar 2020 22:32:23 +0000 (15:32 -0700)]
Merge branch 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
"The biggest changes in this cycle were the vmlinux.o optimizations by
Peter Zijlstra, which are preparatory and optimization work to run
objtool against the much richer vmlinux.o object file, to perform
new, whole-program section based logic. That work exposed a handful
of problems with the existing code, which fixes and optimizations are
merged here. The complete 'vmlinux.o and noinstr' work is still work
in progress, targeted for v5.8.
There's also assorted fixes and enhancements from Josh Poimboeuf.
In particular I'd like to draw attention to commit 644592d328370,
which turns fatal objtool errors into failed kernel builds. This
behavior is IMO now justified on multiple grounds (it's easy currently
to not notice an essentially corrupted kernel build), and the commit
has been in -next testing for several weeks, but there could still be
build failures with old or weird toolchains. Should that be widespread
or high profile enough then I'd suggest a quick revert, to not hold up
the merge window"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
objtool: Re-arrange validate_functions()
objtool: Optimize find_rela_by_dest_range()
objtool: Delete cleanup()
objtool: Optimize read_sections()
objtool: Optimize find_symbol_by_name()
objtool: Resize insn_hash
objtool: Rename find_containing_func()
objtool: Optimize find_symbol_*() and read_symbols()
objtool: Optimize find_section_by_name()
objtool: Optimize find_section_by_index()
objtool: Add a statistics mode
objtool: Optimize find_symbol_by_index()
x86/kexec: Make relocate_kernel_64.S objtool clean
x86/kexec: Use RIP relative addressing
objtool: Rename func_for_each_insn_all()
objtool: Rename func_for_each_insn()
objtool: Introduce validate_return()
objtool: Improve call destination function detection
objtool: Fix clang switch table edge case
objtool: Add relocation check for alternative sections
...
Linus Torvalds [Mon, 30 Mar 2020 22:28:12 +0000 (15:28 -0700)]
Merge tag 'pnp-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull PNP subsystem updates from Rafael Wysocki:
- Update MAINTAINERS to cover include/linux/pnp.h and add the
linux-acpi list to the PNP entry in it
- add the const modifier to the name field definition in struct
pnp_driver
- drop a pointer case in the RTC CMOS driver that has become redundant
All by Corentin Labbe.
* tag 'pnp-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
MAINTAINERS: Add linux-acpi list to PNP
rtc: cmos: remove useless cast for driver_name
PNP: constify driver name
PNP: add missing include/linux/pnp.h to MAINTAINERS
* Replace list_for_each_safe() with list_for_each_entry_safe()
(chenqiwu)"
* tag 'acpi-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (31 commits)
ACPICA: Update version to 20200214
ACPI: PCI: Use scnprintf() for avoiding potential buffer overflow
ACPI: fan: Use scnprintf() for avoiding potential buffer overflow
ACPI: EC: Eliminate EC_FLAGS_QUERY_HANDSHAKE
ACPI: EC: Do not clear boot_ec_is_ecdt in acpi_ec_add()
ACPI: EC: Simplify acpi_ec_ecdt_start() and acpi_ec_init()
ACPI: EC: Consolidate event handler installation code
acpi/x86: ignore unspecified bit positions in the ACPI global lock field
acpi/x86: add a kernel parameter to disable ACPI BGRT
x86/acpi: make "asmlinkage" part first thing in the function definition
ACPI: list_for_each_safe() -> list_for_each_entry_safe()
ACPI: video: remove redundant assignments to variable result
ACPI: OSL: Add missing __acquires/__releases annotations
ACPI / battery: Cleanup Lenovo Ideapad Miix 320 DMI table entry
ACPI / AC: Cleanup DMI quirk table
ACPI: EC: Use fast path in acpi_ec_add() for DSDT boot EC
ACPI: EC: Simplify acpi_ec_add()
ACPI: EC: Drop AE_NOT_FOUND special case from ec_install_handlers()
ACPI: EC: Avoid passing redundant argument to functions
ACPI: EC: Avoid printing confusing messages in acpi_ec_setup()
...
====================
This series adds ALU32 signed and unsigned min/max bounds.
The origins of this work is to fix do_refine_retval_range() which before
this series clamps the return value bounds to [0, max]. However, this
is not correct because its possible these functions may return negative
errors so the correct bound is [*MIN, max]. Where *MIN is the signed
and unsigned min values U64_MIN and S64_MIN. And 'max' here is the max
positive value returned by this routine.
Patch 1 changes the do_refine_retval_range() to return the correct bounds
but this breaks existing programs that were depending on the old incorrect
bound. To repair these old programs we add ALU32 bounds to properly track
the return values from these helpers. The ALU32 bounds are needed because
clang realizes these helepers return 'int' type and will use jmp32 ops
with the return value. With current state of things this does little to
help 64bit bounds and with patch 1 applied will cause many programs to
fail verifier pass. See patch 5 for trace details on how this happens.
Patch 2 does the ALU32 addition it adds the new bounds and populates them
through the verifier. Design note, initially a var32 was added but as
pointed out by Alexei and Edward it is not strictly needed so it was
removed here. This worked out nicely.
Patch 3 notes that the refine return value can now also bound the 32-bit
subregister allowing better bouinds tracking in these cases.
Patches 4 adds a C test case to test_progs which will cause the verifier
to fail if new 32bit and do_refine_retval_range() is incorrect.
Patches 5 and 6 fix test cases that broke after refining the return
values from helpers. I attempted to be explicit about each failure and
why we need the change. See patches for details.
Patch 7 adds some bounds check tests to ensure bounds checking when
mixing alu32, alu64 and jmp32 ops together.
Thanks to Alexei, Edward, and Daniel for initial feedback it helped clean
this up a lot.
v2:
- rebased to bpf-next
- fixed tnum equals optimization for combining 32->64bits
- updated patch to fix verifier test correctly
- updated refine_retval_range to set both s32_*_value and s*_value we
need both to get better bounds tracking
====================
Linus Torvalds [Mon, 30 Mar 2020 22:05:01 +0000 (15:05 -0700)]
Merge tag 'pm-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These clean up and rework the PM QoS API, address a suspend-to-idle
wakeup regression on some ACPI-based platforms, clean up and extend a
few cpuidle drivers, update multiple cpufreq drivers and cpufreq
documentation, and fix a number of issues in devfreq and several other
things all over.
Specifics:
- Clean up and rework the PM QoS API to simplify the code and reduce
the size of it (Rafael Wysocki).
- Fix a suspend-to-idle wakeup regression on Dell XPS13 9370 and
similar platforms where the USB plug/unplug events are handled by
the EC (Rafael Wysocki).
- CLean up the intel_idle and PSCI cpuidle drivers (Rafael Wysocki,
Ulf Hansson).
- Extend the haltpoll cpuidle driver so that it can be forced to run
on some systems where it refused to load (Maciej Szmigiero).
- Convert several cpufreq documents to the .rst format and move the
legacy driver documentation into one common file (Mauro Carvalho
Chehab, Rafael Wysocki).
- Update several cpufreq drivers:
* Extend and fix the imx-cpufreq-dt driver (Anson Huang).
* Improve the -EPROBE_DEFER handling and fix unwanted CPU
overclocking on i.MX6ULL in imx6q-cpufreq (Anson Huang,
Christoph Niedermaier).
* Add support for Krait based SoCs to the qcom driver (Ansuel
Smith).
* Add support for OPP_PLUS to ti-cpufreq (Lokesh Vutla).
* Add platform specific intermediate callbacks support to
cpufreq-dt and update the imx6q driver (Peng Fan).
* Simplify and consolidate some pieces of the intel_pstate
driver and update its documentation (Rafael Wysocki, Alex
Hung).
- Fix several devfreq issues:
* Remove unneeded extern keyword from a devfreq header file and
use the DEVFREQ_GOV_UPDATE_INTERNAL event name instead of
DEVFREQ_GOV_INTERNAL (Chanwoo Choi).
* Fix the handling of dev_pm_qos_remove_request() result
(Leonard Crestez).
* Use constant name for userspace governor (Pierre Kuo).
* Get rid of doc warnings and fix a typo (Christophe JAILLET).
- Use built-in RCU list checking in some places in the PM core to
avoid false-positive RCU usage warnings (Madhuparna Bhowmik).
- Fix removal of wakeup sources to avoid NULL pointer dereferences in
a corner case (Neeraj Upadhyay).
- Clean up the handling of hibernate compat ioctls and fix the
related documentation (Eric Biggers).
- Update the idle_inject power capping driver to use variable-length
arrays instead of zero-length arrays (Gustavo Silva).
- Fix list format in a PM QoS document (Randy Dunlap).
- Make the cpufreq stats module use scnprintf() to avoid potential
buffer overflows (Takashi Iwai).
- Add pm_runtime_get_if_active() to PM-runtime API (Sakari Ailus).
- Allow no domain-idle-states DT property in generic PM domains (Ulf
Hansson).
- Fix a broken y-axis scale in the intel_pstate_tracer utility (Doug
Smythies)"
* tag 'pm-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (78 commits)
cpufreq: intel_pstate: Simplify intel_pstate_cpu_init()
tools/power/x86/intel_pstate_tracer: fix a broken y-axis scale
ACPI: PM: s2idle: Refine active GPEs check
ACPICA: Allow acpi_any_gpe_status_set() to skip one GPE
PM: sleep: wakeup: Skip wakeup_source_sysfs_remove() if device is not there
PM / devfreq: Get rid of some doc warnings
PM / devfreq: Fix handling dev_pm_qos_remove_request result
PM / devfreq: Fix a typo in a comment
PM / devfreq: Change to DEVFREQ_GOV_UPDATE_INTERVAL event name
PM / devfreq: Remove unneeded extern keyword
PM / devfreq: Use constant name of userspace governor
ACPI: PM: s2idle: Fix comment in acpi_s2idle_prepare_late()
cpufreq: qcom: Add support for krait based socs
cpufreq: imx6q-cpufreq: Improve the logic of -EPROBE_DEFER handling
cpufreq: Use scnprintf() for avoiding potential buffer overflow
cpuidle: psci: Split psci_dt_cpu_init_idle()
PM / Domains: Allow no domain-idle-states DT property in genpd when parsing
PM / hibernate: Remove unnecessary compat ioctl overrides
PM: hibernate: fix docs for ioctls that return loff_t via pointer
Documentation: intel_pstate: update links for references
...
Its possible to have divergent ALU32 and ALU64 bounds when using JMP32
instructins and ALU64 arithmatic operations. Sometimes the clang will
even generate this code. Because the case is a bit tricky lets add
a specific test for it.
Here is pseudocode asm version to illustrate the idea,
The intent here is the verifier will fail the load if the 32bit bounds
are not tracked correctly through ALU64 op. Similarly we can check the
64bit bounds are correctly zero extended after ALU32 ops.
John Fastabend [Mon, 30 Mar 2020 21:38:01 +0000 (14:38 -0700)]
bpf: Test_verifier, #65 error message updates for trunc of boundary-cross
After changes to add update_reg_bounds after ALU ops and 32-bit bounds
tracking truncation of boundary crossing range will fail earlier and with
a different error message. Now the test error trace is the following
Because we have 'umin_value == umax_value' instead of previously
where 'umin_value != umax_value' we can now fail earlier noting
that pointer addition is out of bounds.
John Fastabend [Mon, 30 Mar 2020 21:37:40 +0000 (14:37 -0700)]
bpf: Test_verifier, bpf_get_stack return value add <0
With current ALU32 subreg handling and retval refine fix from last
patches we see an expected failure in test_verifier. With verbose
verifier state being printed at each step for clarity we have the
following relavent lines [I omit register states that are not
necessarily useful to see failure cause],
After call bpf_get_stack() on line 14 and some moves we have at line 16
an r8 bound with max_value 48 but an unknown min value. This is to be
expected bpf_get_stack call can only return a max of the input size but
is free to return any negative error in the 32-bit register space. The
C helper is returning an int so will use lower 32-bits.
Lines 17 and 18 clear the top 32 bits with a left/right shift but use
ARSH so we still have worst case min bound before line 19 of -2147483648.
At this point the signed check 'r1 s< r8' meant to protect the addition
on line 22 where dst reg is a map_value pointer may very well return
true with a large negative number. Then the final line 22 will detect
this as an invalid operation and fail the program. What we want to do
is proceed only if r8 is positive non-error. So change 'r1 s< r8' to
'r1 s> r8' so that we jump if r8 is negative.
Next we will throw an error because we access past the end of the map
value. The map value size is 48 and sizeof(struct test_val) is 48 so
we walk off the end of the map value on the second call to
get bpf_get_stack(). Fix this by changing sizeof(struct test_val) to
24 by using 'sizeof(struct test_val) / 2'. After this everything passes
as expected.
John Fastabend [Mon, 30 Mar 2020 21:37:19 +0000 (14:37 -0700)]
bpf: Test_progs, add test to catch retval refine error handling
Before this series the verifier would clamp return bounds of
bpf_get_stack() to [0, X] and this led the verifier to believe
that a JMP_JSLT 0 would be false and so would prune that path.
The result is anything hidden behind that JSLT would be unverified.
Add a test to catch this case by hiding an goto pc-1 behind the
check which will cause an infinite loop if not rejected.
John Fastabend [Mon, 30 Mar 2020 21:36:59 +0000 (14:36 -0700)]
bpf: Verifier, refine 32bit bound in do_refine_retval_range
Further refine return values range in do_refine_retval_range by noting
these are int return types (We will assume here that int is a 32-bit type).
Two reasons to pull this out of original patch. First it makes the original
fix impossible to backport. And second I've not seen this as being problematic
in practice unlike the other case.
John Fastabend [Mon, 30 Mar 2020 21:36:39 +0000 (14:36 -0700)]
bpf: Verifier, do explicit ALU32 bounds tracking
It is not possible for the current verifier to track ALU32 and JMP ops
correctly. This can result in the verifier aborting with errors even though
the program should be verifiable. BPF codes that hit this can work around
it by changin int variables to 64-bit types, marking variables volatile,
etc. But this is all very ugly so it would be better to avoid these tricks.
But, the main reason to address this now is do_refine_retval_range() was
assuming return values could not be negative. Once we fixed this code that
was previously working will no longer work. See do_refine_retval_range()
patch for details. And we don't want to suddenly cause programs that used
to work to fail.
The simplest example code snippet that illustrates the problem is likely
this,
The expected 64-bit and 32-bit bounds after each line are shown on the
right. The current issue is without the w* bounds we are forced to use
the worst case bound of [0, U32_MAX]. To resolve this type of case,
jmp32 creating divergent 32-bit bounds from 64-bit bounds, we add explicit
32-bit register bounds s32_{min|max}_value and u32_{min|max}_value. Then
from branch_taken logic creating new bounds we can track 32-bit bounds
explicitly.
The next case we observed is ALU ops after the jmp32,
In order to keep the bounds accurate at this point we also need to track
ALU32 ops. To do this we add explicit ALU32 logic for each of the ALU
ops, mov, add, sub, etc.
Finally there is a question of how and when to merge bounds. The cases
enumerate here,
For "MOV ALU32" BPF arch zero extends so we simply copy the bounds
from 32-bit into 64-bit ensuring we truncate var_off and 64-bit
bounds correctly. See zext_32_to_64.
For "MOV ALU64" copy all bounds including 32-bit into new register. If
the src register had 32-bit bounds the dst register will as well.
For "op ALU32" zero extend 32-bit into 64-bit the same as move,
see zext_32_to_64.
For "op ALU64" calculate both 32-bit and 64-bit bounds no merging
is done here. Except we have a special case. When RSH or ARSH is
done we can't simply ignore shifting bits from 64-bit reg into the
32-bit subreg. So currently just push bounds from 64-bit into 32-bit.
This will be correct in the sense that they will represent a valid
state of the register. However we could lose some accuracy if an
ARSH is following a jmp32 operation. We can handle this special
case in a follow up series.
For "jmp ALU32" mark 64-bit reg unknown and recalculate 64-bit bounds
from tnum by setting var_off to ((<<(>>var_off)) | var32_off). We
special case if 64-bit bounds has zero'd upper 32bits at which point
we can simply copy 32-bit bounds into 64-bit register. This catches
a common compiler trick where upper 32-bits are zeroed and then
32-bit ops are used followed by a 64-bit compare or 64-bit op on
a pointer. See __reg_combine_64_into_32().
For "jmp ALU64" cast the bounds of the 64bit to their 32-bit
counterpart. For example s32_min_value = (s32)reg->smin_value. For
tnum use only the lower 32bits via, (>>(<<var_off)). See
__reg_combine_64_into_32().
Linus Torvalds [Mon, 30 Mar 2020 21:58:26 +0000 (14:58 -0700)]
Merge tag 'regulator-spi-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc
Pull spi and regulator updates from Mark Brown:
"At one point in the release cycle I managed to fat finger things and
apply some SPI fixes onto a regulator branch and merge that into the
SPI tree, then pull in a change shared with the MTD tree moving the
Mediatek quadspi driver over to become the Mediatek spi-nor driver in
the SPI tree.
This has made a mess which I only just noticed while preparing this
and I can't see a sensible way to unpick things due to other
subsequent merge commits especially the pull from MTD so it looks like
the most sensible thing to do is give up and combine the two pull
requests.
Fortunately both subsystems were fairly quiet this cycle, the
highlights are:
regulator:
- Support for Monoloithic Power Systems MP5416, MP8867 and MPS8869
and Qualcomm PMI8994 and SMB208.
SPI:
- Lots of enhancements for spi-fsl-dspi, including XSPI mode support,
from Vladimir Oltean.
- Support for amlogic Meson G12A, IBM FSI, Mediatek spi-nor (moved
from MTD), NXP i.MX8Mx, Rockchip PX30, RK3308 and RK3328, and
Qualcomm Atheros AR934x/QCA95xx"
* tag 'regulator-spi-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc: (118 commits)
spi: efm32: Convert to use GPIO descriptors
regulator: qcom_smd: Add pmi8994 regulator support
regulator: da9063: Fix get_mode() functions to read sleep field
spi: spi-fsl-lpspi: Replace zero-length array with flexible-array member
spi: spi-s3c24xx: Replace zero-length array with flexible-array member
spi: stm32: Fix comments compilation warnings
spi: atmel-quadspi: Add verbose debug facilities to monitor register accesses
spi: spi-fsl-dspi: Add support for LS1028A
spi: spi-fsl-dspi: Move invariant configs out of dspi_transfer_one_message
spi: spi-fsl-dspi: Fix interrupt-less DMA mode taking an XSPI code path
spi: spi-fsl-dspi: Avoid NULL pointer in dspi_slave_abort for non-DMA mode
spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion
spi: spi-fsl-dspi: Protect against races on dspi->words_in_flight
spi: spi-fsl-dspi: Avoid reading more data than written in EOQ mode
spi: spi-fsl-dspi: Fix bits-per-word acceleration in DMA mode
spi: spi-fsl-dspi: Fix little endian access to PUSHR CMD and TXDATA
spi: spi-fsl-dspi: Don't access reserved fields in SPI_MCR
regulator: driver.h: fix regulator_map_* function names
regulator: da9063: fix suspend
spi: mxs: Drop GPIO includes
...
John Fastabend [Mon, 30 Mar 2020 21:36:19 +0000 (14:36 -0700)]
bpf: Verifier, do_refine_retval_range may clamp umin to 0 incorrectly
do_refine_retval_range() is called to refine return values from specified
helpers, probe_read_str and get_stack at the moment, the reasoning is
because both have a max value as part of their input arguments and
because the helper ensure the return value will not be larger than this
we can set smax values of the return register, r0.
However, the return value is a signed integer so setting umax is incorrect
It leads to further confusion when the do_refine_retval_range() then calls,
__reg_deduce_bounds() which will see a umax value as meaning the value is
unsigned and then assuming it is unsigned set the smin = umin which in this
case results in 'smin = 0' and an 'smax = X' where X is the input argument
from the helper call.
Here are the comments from _reg_deduce_bounds() on why this would be safe
to do.
/* Learn sign from unsigned bounds. Signed bounds cross the sign
* boundary, so we must be careful.
*/
if ((s64)reg->umax_value >= 0) {
/* Positive. We can't learn anything from the smin, but smax
* is positive, hence safe.
*/
reg->smin_value = reg->umin_value;
reg->smax_value = reg->umax_value = min_t(u64, reg->smax_value,
reg->umax_value);
But now we incorrectly have a return value with type int with the
signed bounds (0,X). Suppose the return value is negative, which is
possible the we have the verifier and reality out of sync. Among other
things this may result in any error handling code being falsely detected
as dead-code and removed. For instance the example below shows using
bpf_probe_read_str() causes the error path to be identified as dead
code and removed.
To fix omit setting the umax value because its not safe. The only
actual bounds we know is the smax. This results in the correct bounds
(SMIN, X) where X is the max length from the helper. After this the
new verifier state looks like the following after call 45.
R0=inv(id=0,smax_value=100)
Then xlated version no longer removed dead code giving the expected
result,
Note, bpf_probe_read_* calls are root only so we wont hit this case
with non-root bpf users.
v3: comment had some documentation about meta set to null case which
is not relevant here and confusing to include in the comment.
v2 note: In original version we set msize_smax_value from check_func_arg()
and propagated this into smax of retval. The logic was smax is the bound
on the retval we set and because the type in the helper is ARG_CONST_SIZE
we know that the reg is a positive tnum_const() so umax=smax. Alexei
pointed out though this is a bit odd to read because the register in
check_func_arg() has a C type of u32 and the umax bound would be the
normally relavent bound here. Pulling in extra knowledge about future
checks makes reading the code a bit tricky. Further having a signed
meta data that can only ever be positive is also a bit odd. So dropped
the msize_smax_value metadata and made it a u64 msize_max_value to
indicate its unsigned. And additionally save bound from umax value in
check_arg_funcs which is the same as smax due to as noted above tnumx_cont
and negative check but reads better. By my analysis nothing functionally
changes in v2 but it does get easier to read so that is win.
Linus Torvalds [Mon, 30 Mar 2020 21:20:41 +0000 (14:20 -0700)]
Merge tag 'staging-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver updates from Greg KH:
"Here is the big staging and IIO driver pull request for 5.7-rc1.
We again end up deleting more code than we added here, thanks to
finally getting rid of the old and obsolete wireless USB stuff, and
the exfat code (which is coming in again through the vfs tree in a
much cleaner version).
But some code does come back, with the octeon drivers being found to
actually be used in the wild, so those deletions are now reverted.
Other than those major things, just loads and loads of tiny checkpatch
cleanups all over the place, along with new IIO drivers and fixes.
All have been in linux-next with no reported issues"
[ Stephen Rothwell points out some reported issues due to merge conflicts ]
* tag 'staging-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (415 commits)
staging: vt6656: Use DIV_ROUND_UP macro instead of specific code
staging: remove hp100 driver
staging: wilc1000: Use crc7 in lib/ rather than a private copy
Staging: rtl8192u: ieee80211: Use netdev_alert().
Staging: rtl8192u: ieee80211: Use netdev_info() with network devices.
Staging: rtl8192u: ieee80211: Use netdev_warn() for network devices.
Staging: rtl8192u: ieee80211: Use netdev_dbg() for debug messages.
staging: wlan-ng: fix use-after-free Read in hfa384x_usbin_callback
staging: rtl8723bs: hal: Remove NULL check before kfree
staging: rtl8723bs: hal: Correct typos in comments
staging: rtl8723bs: os_dep: Correct typos in comments
staging: rtl8723bs: core: Correct typos in comments
staging: rtl8723bs: hal: Remove unnecessary cast on void pointer
staging: rtl8188eu: cleanup long line in odm.c
staging: rtl8723bs: hal: Compress return logic
staging: rtl8723bs: rtw_cmd: Compress lines for immediate return
staging: rtl8723bs: rtw_efuse: Compress lines for immediate return
staging: wilc1000: remove label from examples in DT binding documentation
staging: rtl8723bs: Remove blank line before '}' brace
Staging: rtl8188eu: hal: Add space around operators
...
Linus Torvalds [Mon, 30 Mar 2020 20:59:52 +0000 (13:59 -0700)]
Merge tag 'driver-core-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the "big" set of driver core changes for 5.7-rc1.
Nothing huge in here, just lots of little firmware core changes and
use of new apis, a libfs fix, a debugfs api change, and some driver
core deferred probe rework.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (44 commits)
Revert "driver core: Set fw_devlink to "permissive" behavior by default"
driver core: Set fw_devlink to "permissive" behavior by default
driver core: Replace open-coded list_last_entry()
driver core: Read atomic counter once in driver_probe_done()
libfs: fix infoleak in simple_attr_read()
driver core: Add device links from fwnode only for the primary device
platform/x86: touchscreen_dmi: Add info for the Chuwi Vi8 Plus tablet
platform/x86: touchscreen_dmi: Add EFI embedded firmware info support
Input: icn8505 - Switch to firmware_request_platform for retreiving the fw
Input: silead - Switch to firmware_request_platform for retreiving the fw
selftests: firmware: Add firmware_request_platform tests
test_firmware: add support for firmware_request_platform
firmware: Add new platform fallback mechanism and firmware_request_platform()
Revert "drivers: base: power: wakeup.c: Use built-in RCU list checking"
drivers: base: power: wakeup.c: Use built-in RCU list checking
component: allow missing unbind callback
debugfs: remove return value of debugfs_create_file_size()
debugfs: Check module state before warning in {full/open}_proxy_open()
firmware: fix a double abort case with fw_load_sysfs_fallback
arch_topology: Fix putting invalid cpu clk
...
Linus Torvalds [Mon, 30 Mar 2020 20:54:11 +0000 (13:54 -0700)]
Merge tag 'usb-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / PHY updates from Greg KH:
"Here are the big set of USB and PHY driver patches for 5.7-rc1.
Nothing huge here, some new PHY drivers, loads of USB gadget fixes and
updates, xhci updates, usb-serial driver updates and new device ids,
and other minor things. Full details in the shortlog.
All have been in linux-next for a while with no reported issues"
* tag 'usb-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (239 commits)
USB: cdc-acm: restore capability check order
usb: cdns3: make signed 1 bit bitfields unsigned
usb: gadget: fsl: remove unused variable 'driver_desc'
usb: gadget: f_fs: Fix use after free issue as part of queue failure
usb: typec: Correct the documentation for typec_cable_put()
USB: serial: io_edgeport: fix slab-out-of-bounds read in edge_interrupt_callback
USB: serial: option: add Wistron Neweb D19Q1
USB: serial: option: add BroadMobi BM806U
USB: serial: option: add support for ASKEY WWHC050
usb: core: Add ACPI support for USB interface devices
driver core: platform: Reimplement devm_platform_ioremap_resource
usb: dwc2: convert to devm_platform_get_and_ioremap_resource
usb: host: hisilicon: convert to devm_platform_get_and_ioremap_resource
usb: host: xhci-plat: convert to devm_platform_get_and_ioremap_resource
drivers: provide devm_platform_get_and_ioremap_resource()
phy: qcom-qusb2: Add new overriding tuning parameters in QUSB2 V2 PHY
phy: qcom-qusb2: Add support for overriding tuning parameters in QUSB2 V2 PHY
dt-bindings: phy: qcom-qusb2: Add support for overriding Phy tuning parameters
phy: qcom-qusb2: Add generic QUSB2 V2 PHY support
dt-bindings: phy: qcom,qusb2: Add compatibles for QUSB2 V2 phy and SC7180
...
====================
Introduce a new helper that allows assigning a previously-found socket
to the skb as the packet is received towards the stack, to cause the
stack to guide the packet towards that socket subject to local routing
configuration. The intention is to support TProxy use cases more
directly from eBPF programs attached at TC ingress, to simplify and
streamline Linux stack configuration in scale environments with Cilium.
Normally in ip{,6}_rcv_core(), the skb will be orphaned, dropping any
existing socket reference associated with the skb. Existing tproxy
implementations in netfilter get around this restriction by running the
tproxy logic after ip_rcv_core() in the PREROUTING table. However, this
is not an option for TC-based logic (including eBPF programs attached at
TC ingress).
This series introduces the BPF helper bpf_sk_assign() to associate the
socket with the skb on the ingress path as the packet is passed up the
stack. The initial patch in the series simply takes a reference on the
socket to ensure safety, but later patches relax this for listen
sockets.
To ensure delivery to the relevant socket, we still consult the routing
table, for full examples of how to configure see the tests in patch #5;
the simplest form of the route would look like this:
$ ip route add local default dev lo
This series is laid out as follows:
* Patch 1 extends the eBPF API to add sk_assign() and defines a new
socket free function to allow the later paths to understand when the
socket associated with the skb should be kept through receive.
* Patches 2-3 optimize the receive path to avoid taking a reference on
listener sockets during receive.
* Patches 4-5 extends the selftests with examples of the new
functionality and validation of correct behaviour.
Changes since v4:
* Fix build with CONFIG_INET disabled
* Rebase
Changes since v3:
* Use sock_gen_put() directly instead of sock_edemux() from sock_pfree()
* Commit message wording fixups
* Add acks from Martin, Lorenz
* Rebase
Changes since v2:
* Add selftests for UDP socket redirection
* Drop the early demux optimization patch (defer for more testing)
* Fix check for orphaning after TC act return
* Tidy up the tests to clean up properly and be less noisy.
Changes since v1:
* Replace the metadata_dst approach with using the skb->destructor to
determine whether the socket has been prefetched. This is much
simpler.
* Avoid taking a reference on listener sockets during receive
* Restrict assigning sockets across namespaces
* Restrict assigning SO_REUSEPORT sockets
* Fix cookie usage for socket dst check
* Rebase the tests against test_progs infrastructure
* Tidy up commit messages
====================
Lorenz Bauer [Sun, 29 Mar 2020 22:53:41 +0000 (15:53 -0700)]
selftests: bpf: Add test for sk_assign
Attach a tc direct-action classifier to lo in a fresh network
namespace, and rewrite all connection attempts to localhost:4321
to localhost:1234 (for port tests) and connections to unreachable
IPv4/IPv6 IPs to the local socket (for address tests). Includes
implementations for both TCP and UDP.
Keep in mind that both client to server and server to client traffic
passes the classifier.
Joe Stringer [Sun, 29 Mar 2020 22:53:40 +0000 (15:53 -0700)]
bpf: Don't refcount LISTEN sockets in sk_assign()
Avoid taking a reference on listen sockets by checking the socket type
in the sk_assign and in the corresponding skb_steal_sock() code in the
the transport layer, and by ensuring that the prefetch free (sock_pfree)
function uses the same logic to check whether the socket is refcounted.
Joe Stringer [Sun, 29 Mar 2020 22:53:39 +0000 (15:53 -0700)]
net: Track socket refcounts in skb_steal_sock()
Refactor the UDP/TCP handlers slightly to allow skb_steal_sock() to make
the determination of whether the socket is reference counted in the case
where it is prefetched by earlier logic such as early_demux.
Joe Stringer [Sun, 29 Mar 2020 22:53:38 +0000 (15:53 -0700)]
bpf: Add socket assign support
Add support for TPROXY via a new bpf helper, bpf_sk_assign().
This helper requires the BPF program to discover the socket via a call
to bpf_sk*_lookup_*(), then pass this socket to the new helper. The
helper takes its own reference to the socket in addition to any existing
reference that may or may not currently be obtained for the duration of
BPF processing. For the destination socket to receive the traffic, the
traffic must be routed towards that socket via local route. The
simplest example route is below, but in practice you may want to route
traffic more narrowly (eg by CIDR):
$ ip route add local default dev lo
This patch avoids trying to introduce an extra bit into the skb->sk, as
that would require more invasive changes to all code interacting with
the socket to ensure that the bit is handled correctly, such as all
error-handling cases along the path from the helper in BPF through to
the orphan path in the input. Instead, we opt to use the destructor
variable to switch on the prefetch of the socket.
Linus Torvalds [Mon, 30 Mar 2020 20:42:05 +0000 (13:42 -0700)]
Merge tag 'media/v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- New sensor driver: imx219
- Support for some new pixelformats
- Support for Sun8i SoC
- Added more codecs to meson vdec driver
- Prepare for removing the legacy usbvision driver by moving it to
staging. This driver has issues and use legacy core APIs. If nobody
steps up to address those, it is time for its retirement.
- Several cleanups and improvements on drivers, with the addition of
new supported boards
* tag 'media/v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (236 commits)
media: venus: firmware: Ignore secure call error on first resume
media: mtk-vpu: load vpu firmware from the new location
media: i2c: video-i2c: fix build errors due to 'imply hwmon'
media: MAINTAINERS: add myself to co-maintain Hantro G1/G2 for i.MX8MQ
media: hantro: add initial i.MX8MQ support
media: dt-bindings: Document i.MX8MQ VPU bindings
media: vivid: fix incorrect PA assignment to HDMI outputs
media: hantro: Add linux-rockchip mailing list to MAINTAINERS
media: cedrus: h264: Fix 4K decoding on H6
media: siano: Use scnprintf() for avoiding potential buffer overflow
media: rc: Use scnprintf() for avoiding potential buffer overflow
media: allegro: create new struct for channel parameters
media: allegro: move mail definitions to separate file
media: allegro: pass buffers through firmware
media: allegro: verify source and destination buffer in VCU response
media: allegro: handle dependency of bitrate and bitrate_peak
media: allegro: read bitrate mode directly from control
media: allegro: make QP configurable
media: allegro: make frame rate configurable
media: allegro: skip filler data if possible
...
Daniel Borkmann [Mon, 30 Mar 2020 20:38:54 +0000 (22:38 +0200)]
bpf, doc: Add John as official reviewer to BPF subsystem
We've added John Fastabend to our weekly BPF patch review rotation over
last months now where he provided excellent and timely feedback on BPF
patches. Therefore, add him to the BPF core reviewer team to the MAINTAINERS
file to reflect that.
Linus Torvalds [Mon, 30 Mar 2020 20:34:25 +0000 (13:34 -0700)]
Merge tag 'hwmon-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
- New driver for AXI fan control
- Attenuator bypass support and support for inverting pwm output in
adt7475 driver
- Support for new power supply version in ibm-cffps driver
- PMBus drivers:
* support for multi-phase chips
* ltc2978 driver: add support for LTC2972, LTC2979, LTC3884,
LTC3889, LTC7880, LTM4664, LTM4677, LTM4678, LTM4680, and
LTM4700/
* tps53679 driver: add support for TPS53681, TPS53647, and TPS53667
* isl68137 driver: support for various 2nd Gen Renesas digital
multiphase chips added to isl68137 driver
- Minor improvements and fixes in nct7904, ibmpowernv, lm73, ibmaem,
and k10temp drivers
* tag 'hwmon-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (29 commits)
docs: hwmon: Update documentation for isl68137 pmbus driver
hwmon: (pmbus) add support for 2nd Gen Renesas digital multiphase
hwmon: (pmbus/ibm-cffps) Add another PSU CCIN to version detection
hwmon: (nct7904) Fix the incorrect quantity for fan & temp attributes
hwmon: (ibmpowernv) Use scnprintf() for avoiding potential buffer overflow
hwmon: (adt7475) Add support for inverting pwm output
hwmon: (adt7475) Add attenuator bypass support
dt-bindings: hwmon: Document adt7475 pwm-active-state property
dt-bindings: hwmon: Document adt7475 bypass-attenuator property
dt-bindings: hwmon: Document adt7475 binding
hwmon: (lm73) Add support for of_match_table
dt-bindings: Add TI LM73 as a trivial device
hwmon: (pmbus/tps53679) Add documentation
hwmon: (pmbus/tps53679) Add support for TPS53647 and TPS53667
hwmon: (pmbus/tps53679) Add support for TPS53681
hwmon: (pmbus/tps53679) Add support for IIN and PIN to TPS53679 and TPS53688
hwmon: (pmbus/tps53679) Add support for multiple chips IDs
hwmon: (pmbus) Implement multi-phase support
hwmon: (pmbus) Add 'phase' parameter where needed for multi-phase support
hwmon: (pmbus) Add IC_DEVICE_ID and IC_DEVICE_REV command definitions
...
KP Singh [Mon, 30 Mar 2020 14:42:46 +0000 (16:42 +0200)]
bpf: btf: Fix arg verification in btf_ctx_access()
The bounds checking for the arguments accessed in the BPF program breaks
when the expected_attach_type is not BPF_TRACE_FEXIT, BPF_LSM_MAC or
BPF_MODIFY_RETURN resulting in no check being done for the default case
(the programs which do not receive the return value of the attached
function in its arguments) when the index of the argument being accessed
is equal to the number of arguments (nr_args).
This was a result of a misplaced "else if" block introduced by the
Commit 6ba43b761c41 ("bpf: Attachment verification for
BPF_MODIFY_RETURN")
Linus Torvalds [Mon, 30 Mar 2020 20:17:50 +0000 (13:17 -0700)]
Merge tag 'ras_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
- Do not report spurious MCEs on some Intel platforms caused by errata;
by Prarit Bhargava.
- Change dev-mcelog's hardcoded limit of 32 error records to a dynamic
one, controlled by the number of logical CPUs, by Tony Luck.
- Add support for the processor identification number (PPIN) on AMD, by
Wei Huang.
* tag 'ras_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/amd: Add PPIN support for AMD MCE
x86/mce/dev-mcelog: Dynamically allocate space for machine check records
x86/mce: Do not log spurious corrected mce errors
Linus Torvalds [Mon, 30 Mar 2020 20:12:37 +0000 (13:12 -0700)]
Merge tag 'edac_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- A substantial edac_mc cleanup, sanitizing object freeing,
streamlining and simplifying code flow, and getting rid of a lot of
needless complexity in memory controller representation code, by
Robert Richter.
- A new EDAC driver for the ARM DMC-520 memory controller, by Lei Wang,
Shiping Ji and others.
- The usual sprinkling of misc cleanups and fixes all over the
subsystem.
* tag 'edac_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/armada_xp: Use scnprintf() for avoiding potential buffer overflow
EDAC/synopsys: Do not dump uninitialized pinf->col
EDAC: Add EDAC driver for DMC520
dt-bindings: edac: Dmc-520.yaml
EDAC/mce_amd: Print !SMCA processor warning only once
EDAC/mc: Remove per layer counters
EDAC/mc: Remove detail[] string and cleanup error string generation
EDAC/mc: Pass the error descriptor to error reporting functions
EDAC/mc: Remove enable_per_layer_report function argument
EDAC/mc: Report "unknown memory" on too many DIMM labels found
EDAC/mc: Carve out error increment into a separate function
EDAC/mc: Determine mci pointer from the error descriptor
EDAC: Store error type in struct edac_raw_error_desc
EDAC/mc: Reorder functions edac_mc_alloc*()
EDAC/mc: Split edac_mc_alloc() into smaller functions
EDAC/mc: Change mci device removal to use put_device()
Linus Torvalds [Mon, 30 Mar 2020 20:09:34 +0000 (13:09 -0700)]
Merge tag 'pstore-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
"These mostly some minor cleanups and a bug fix for an ftrace corner
case:
- Improve failure paths (chenqiwu)
- Fix ftrace position index (Vasily Averin)
- Use proper flexible-array member (Gustavo A. R. Silva)"
* tag 'pstore-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Replace zero-length array with flexible-array member
pstore: pstore_ftrace_seq_next should increase position index
pstore/ram: remove unnecessary ramoops_unregister_dummy()
pstore/platform: fix potential mem leak if pstore_init_fs failed
Linus Torvalds [Mon, 30 Mar 2020 19:53:56 +0000 (12:53 -0700)]
Merge tag 'seccomp-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
"A couple of seccomp updates. They're both mostly bug fixes that I
wanted to have sit in linux-next for a while:
- allow TSYNC and USER_NOTIF together (Tycho Andersen)
- add missing compat_ioctl for notify (Sven Schnelle)"
* tag 'seccomp-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Add missing compat_ioctl for notify
seccomp: allow TSYNC and USER_NOTIF together
Linus Torvalds [Mon, 30 Mar 2020 19:49:33 +0000 (12:49 -0700)]
Merge tag 'erofs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"Updates with a XArray adaptation, several fixes for shrinker and
corrupted images are ready for this cycle.
All commits have been stress tested with no noticeable smoke out and
have been in linux-next as well.
Summary:
- Convert radix tree usage to XArray
- Fix shrink scan count on multiple filesystem instances
- Better handling for specific corrupted images
- Update my email address in MAINTAINERS"
* tag 'erofs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
MAINTAINERS: erofs: update my email address
erofs: handle corrupted images whose decompressed size less than it'd be
erofs: use LZ4_decompress_safe() for full decoding
erofs: correct the remaining shrink objects
erofs: convert workstn to XArray
Linus Torvalds [Mon, 30 Mar 2020 19:45:23 +0000 (12:45 -0700)]
Merge tag 'docs-5.7' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"This has been a busy cycle for documentation work.
Highlights include:
- Lots of RST conversion work by Mauro, Daniel ALmeida, and others.
Maybe someday we'll get to the end of this stuff...maybe...
- Some organizational work to bring some order to the core-api
manual.
- Various new docs and additions to the existing documentation.
- Typo fixes, warning fixes, ..."
* tag 'docs-5.7' of git://git.lwn.net/linux: (123 commits)
Documentation: x86: exception-tables: document CONFIG_BUILDTIME_TABLE_SORT
MAINTAINERS: adjust to filesystem doc ReST conversion
docs: deprecated.rst: Add BUG()-family
doc: zh_CN: add translation for virtiofs
doc: zh_CN: index files in filesystems subdirectory
docs: locking: Drop :c:func: throughout
docs: locking: Add 'need' to hardirq section
docs: conf.py: avoid thousands of duplicate label warning on Sphinx
docs: prevent warnings due to autosectionlabel
docs: fix reference to core-api/namespaces.rst
docs: fix pointers to io-mapping.rst and io_ordering.rst files
Documentation: Better document the softlockup_panic sysctl
docs: hw-vuln: tsx_async_abort.rst: get rid of an unused ref
docs: perf: imx-ddr.rst: get rid of a warning
docs: filesystems: fuse.rst: supress a Sphinx warning
docs: translations: it: avoid duplicate refs at programming-language.rst
docs: driver.rst: supress two ReSt warnings
docs: trace: events.rst: convert some new stuff to ReST format
Documentation: Add io_ordering.rst to driver-api manual
Documentation: Add io-mapping.rst to driver-api manual
...
Linus Torvalds [Mon, 30 Mar 2020 19:18:49 +0000 (12:18 -0700)]
Merge tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
"Here are the io_uring changes for this merge window. Light on new
features this time around (just splice + buffer selection), lots of
cleanups, fixes, and improvements to existing support. In particular,
this contains:
- Cleanup fixed file update handling for stack fallback (Hillf)
- Re-work of how pollable async IO is handled, we no longer require
thread offload to handle that. Instead we rely using poll to drive
this, with task_work execution.
- In conjunction with the above, allow expendable buffer selection,
so that poll+recv (for example) no longer has to be a split
operation.
- Make sure we honor RLIMIT_FSIZE for buffered writes
- Add support for splice (Pavel)
- Linked work inheritance fixes and optimizations (Pavel)
* tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-block: (54 commits)
io_uring: cleanup io_alloc_async_ctx()
io_uring: fix missing 'return' in comment
io-wq: handle hashed writes in chains
io-uring: drop 'free_pfile' in struct io_file_put
io-uring: drop completion when removing file
io_uring: Fix ->data corruption on re-enqueue
io-wq: close cancel gap for hashed linked work
io_uring: make spdxcheck.py happy
io_uring: honor original task RLIMIT_FSIZE
io-wq: hash dependent work
io-wq: split hashing and enqueueing
io-wq: don't resched if there is no work
io-wq: remove duplicated cancel code
io_uring: fix truncated async read/readv and write/writev retry
io_uring: dual license io_uring.h uapi header
io_uring: io_uring_enter(2) don't poll while SETUP_IOPOLL|SETUP_SQPOLL enabled
io_uring: Fix unused function warnings
io_uring: add end-of-bits marker and build time verify it
io_uring: provide means of removing buffers
io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG
...
Haishuang Yan [Mon, 30 Mar 2020 03:20:15 +0000 (11:20 +0800)]
ipvs: fix uninitialized variable warning
If outer_proto is not set, GCC warning as following:
In file included from net/netfilter/ipvs/ip_vs_core.c:52:
net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_in_icmp':
include/net/ip_vs.h:233:4: warning: 'outer_proto' may be used uninitialized in this function [-Wmaybe-uninitialized]
233 | printk(KERN_DEBUG pr_fmt(msg), ##__VA_ARGS__); \
| ^~~~~~
net/netfilter/ipvs/ip_vs_core.c:1666:8: note: 'outer_proto' was declared here
1666 | char *outer_proto;
| ^~~~~~~~~~~
Fixes: 73348fed35d0 ("ipvs: optimize tunnel dumps for icmp errors") Signed-off-by: Haishuang Yan <[email protected]> Acked-by: Julian Anastasov <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
netfilter: nft_exthdr: fix endianness of tcp option cast
I got a problem on MIPS with Big-Endian is turned on: every time when
NF trying to change TCP MSS it returns because of new.v16 was greater
than old.v16. But real MSS was 1460 and my rule was like this:
add rule table chain tcp option maxseg size set 1400
And 1400 is lesser that 1460, not greater.
Later I founded that main causer is cast from u32 to __be16.
Debugging:
In example MSS = 1400(HEX: 0x578). Here is representation of each byte
like it is in memory by addresses from left to right(e.g. [0x0 0x1 0x2
0x3]). LE — Little-Endian system, BE — Big-Endian, left column is type.
LE BE
u32: [78 05 00 00] [00 00 05 78]
As you can see, u32 representation will be casted to u16 from different
half of 4-byte address range. But actually nf_tables uses registers and
store data of various size. Actually TCP MSS stored in 2 bytes. But
registers are still u32 in definition:
So, access like regs->data[priv->sreg] exactly u32. So, according to
table presents above, per-byte representation of stored TCP MSS in
register will be:
LE BE
(u32)regs->data[]: [78 05 00 00] [05 78 00 00]
^^ ^^
We see that register uses just half of u32 and other 2 bytes may be
used for some another data. But in nft_exthdr_tcp_set_eval() it casted
just like u32 -> __be16:
new.v16 = src
But u32 overfill __be16, so it get 2 low bytes. For clarity draw
one more table(<xx xx> means that bytes will be used for cast).
LE BE
u32: [<78 05> 00 00] [00 00 <05 78>]
(u32)regs->data[]: [<78 05> 00 00] [05 78 <00 00>]
As you can see, for Little-Endian nothing changes, but for Big-endian we
take the wrong half. In my case there is some other data instead of
zeros, so new MSS was wrongly greater.
For shooting this bug I used solution for ports ranges. Applying of this
patch does not affect Little-Endian systems.
Jann Horn [Mon, 30 Mar 2020 16:03:24 +0000 (18:03 +0200)]
bpf: Simplify reg_set_min_max_inv handling
reg_set_min_max_inv() contains exactly the same logic as reg_set_min_max(),
just flipped around. While this makes sense in a cBPF verifier (where ALU
operations are not symmetric), it does not make sense for eBPF.
Replace reg_set_min_max_inv() with a helper that flips the opcode around,
then lets reg_set_min_max() do the complicated work.
Jann Horn [Mon, 30 Mar 2020 16:03:23 +0000 (18:03 +0200)]
bpf: Fix tnum constraints for 32-bit comparisons
The BPF verifier tried to track values based on 32-bit comparisons by
(ab)using the tnum state via 581738a681b6 ("bpf: Provide better register
bounds after jmp32 instructions"). The idea is that after a check like
this:
if ((u32)r0 > 3)
exit
We can't meaningfully constrain the arithmetic-range-based tracking, but
we can update the tnum state to (value=0,mask=0xffff'ffff'0000'0003).
However, the implementation from 581738a681b6 didn't compute the tnum
constraint based on the fixed operand, but instead derives it from the
arithmetic-range-based tracking. This means that after the following
sequence of operations:
if (r0 >= 0x1'0000'0001)
exit
if ((u32)r0 > 7)
exit
The verifier assumed that the lower half of r0 is in the range (0, 0)
and apply the tnum constraint (value=0,mask=0xffff'ffff'0000'0000) thus
causing the overall tnum to be (value=0,mask=0x1'0000'0000), which was
incorrect. Provide a fixed implementation.
The verifier rewrote original instructions it recognized as dead code with
'goto pc-1', but reality differs from verifier simulation in that we're
actually able to trigger a hang due to hitting the 'goto pc-1' instructions.
Taking different examples to make the issue more obvious: in this example
we're probing bounds on a completely unknown scalar variable in r1:
We're first probing lower/upper bounds via jmp64, later we do a similar
check via jmp32 and examine the resulting var_off there. After fall-through
in insn 14, we get the following bounded r1 with 0x7fffffffff unknown marked
bits in the variable section.
Thus, after knowing r1 <= 0x4000000000 and r1 >= 0x2000000000:
The lower/upper bounds haven't changed since they have high bits set in
u64 space and the jmp32 tests can only refine bounds in the low bits.
However, for the var part the expectation would have been 0x7f000007ff
or something less precise up to 0x7fffffffff. A outcome of 0x7f00000000
is not correct since it would contradict the earlier probed bounds
where we know that the result should have been in [0x200,0x400] in u32
space. Therefore, tests with such info will lead to wrong verifier
assumptions later on like falsely predicting conditional jumps to be
always taken, etc.
The issue here is that __reg_bound_offset32()'s implementation from
commit 581738a681b6 ("bpf: Provide better register bounds after jmp32
instructions") makes an incorrect range assumption:
In the above walk-through example, __reg_bound_offset32() as-is chose
a range after masking with 0xffffffff of [0x0,0x0] since umin:0x2000000000
and umax:0x4000000000 and therefore the lo32 part was clamped to 0x0 as
well. However, in the umin:0x2000000000 and umax:0x4000000000 range above
we'd end up with an actual possible interval of [0x0,0xffffffff] for u32
space instead.
In case of the original reproducer, the situation looked as follows at
insn 5 for r0:
After the fall-through, we similarly forced the var_off result into
the wrong range [0x30303030,0x3030302f] suggesting later on that fixed
bits must only be of 0x30303020 with 0x10000001f unknowns whereas such
assumption can only be made when both bounds in hi32 range match.
Originally, I was thinking to fix this by moving reg into a temp reg and
use proper coerce_reg_to_size() helper on the temp reg where we can then
based on that define the range tnum for later intersection:
In the case of the concrete example, this gives us a more conservative unknown
section. Thus, after knowing r1 <= 0x4000000000 and r1 >= 0x2000000000 and
w1 <= 0x400 and w1 >= 0x200:
However, above new __reg_bound_offset32() has no effect on refining the
knowledge of the register contents. Meaning, if the bounds in hi32 range
mismatch we'll get the identity function given the range reg spans
[0x0,0xffffffff] and we cast var_off into lo32 only to later on binary
or it again with the hi32.
Likewise, if the bounds in hi32 range match, then we mask both bounds
with 0xffffffff, use the resulting umin/umax for the range to later
intersect the lo32 with it. However, _prior_ called __reg_bound_offset()
did already such intersection on the full reg and we therefore would only
repeat the same operation on the lo32 part twice.
Given this has no effect and the original commit had false assumptions,
this patch reverts the code entirely which is also more straight forward
for stable trees: apparently 581738a681b6 got auto-selected by Sasha's
ML system and misclassified as a fix, so it got sucked into v5.4 where
it should never have landed. A revert is low-risk also from a user PoV
since it requires a recent kernel and llc to opt-into -mcpu=v3 BPF CPU
to generate jmp32 instructions. A proper bounds refinement would need a
significantly more complex approach which is currently being worked, but
no stable material [0]. Hence revert is best option for stable. After the
revert, the original reported program gets rejected as follows:
David S. Miller [Mon, 30 Mar 2020 18:52:28 +0000 (11:52 -0700)]
Merge branch 'split-phylink-PCS-operations'
Russell King says:
====================
split phylink PCS operations
This series splits the phylink_mac_ops structure so that PCS can be
supported separately with their own PCS operations, separating them
from the MAC layer. This may need adaption later as more users come
along.
v2: change pcs_config() and associated called function prototypes to
only pass the information that is required, and add some documention.
Russell King [Mon, 30 Mar 2020 17:44:55 +0000 (18:44 +0100)]
net: phylink: add separate pcs operations structure
Add a separate set of PCS operations, which MAC drivers can use to
couple phylink with their associated MAC PCS layer. The PCS
operations include:
- pcs_get_state() - reads the link up/down, resolved speed, duplex
and pause from the PCS.
- pcs_config() - configures the PCS for the specified mode, PHY
interface type, and setting the advertisement.
- pcs_an_restart() - restarts 802.3 in-band negotiation with the
link partner
- pcs_link_up() - informs the PCS that link has come up, and the
parameters of the link. Link parameters are used to program the
PCS for fixed speed and non-inband modes.
Russell King [Mon, 30 Mar 2020 17:44:50 +0000 (18:44 +0100)]
net: phylink: rename 'ops' to 'mac_ops'
Rename the bland 'ops' member of struct phylink to be a more
descriptive 'mac_ops' - this is necessary as we're about to introduce
another set of operations.
Heiner Kallweit [Sun, 29 Mar 2020 23:53:39 +0000 (01:53 +0200)]
r8169: factor out rtl8169_tx_map
Factor out mapping the tx skb to a new function rtl8169_tx_map(). This
allows to remove redundancies, and rtl8169_get_txd_opts1() has only
one user left, so it can be inlined.
As a result rtl8169_xmit_frags() is significantly simplified, and in
rtl8169_start_xmit() the code is simplified and better readable.
No functional change intended.
Here are a few more Bluetooth patches for the 5.7 kernel:
- Fix assumption of encryption key size when reading fails
- Add support for DEFER_SETUP with L2CAP Enhanced Credit Based Mode
- Fix issue with auto-connected devices
- Fix suspend handling when entering the state fails
====================
Yuval Basson [Sun, 29 Mar 2020 17:32:49 +0000 (20:32 +0300)]
qed: Fix use after free in qed_chain_free
The qed_chain data structure was modified in
commit 1a4a69751f4d ("qed: Chain support for external PBL") to support
receiving an external pbl (due to iWARP FW requirements).
The pages pointed to by the pbl are allocated in qed_chain_alloc
and their virtual address are stored in an virtual addresses array to
enable accessing and freeing the data. The physical addresses however
weren't stored and were accessed directly from the external-pbl
during free.
Destroy-qp flow, leads to freeing the external pbl before the chain is
freed, when the chain is freed it tries accessing the already freed
external pbl, leading to a use-after-free. Therefore we need to store
the physical addresses in additional to the virtual addresses in a
new data structure.
Fixes: 1a4a69751f4d ("qed: Chain support for external PBL") Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Bason <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Heiner Kallweit [Sun, 29 Mar 2020 16:28:45 +0000 (18:28 +0200)]
r8169: improve handling of TD_MSS_MAX
If the mtu is greater than TD_MSS_MAX, then TSO is disabled, see
rtl8169_fix_features(). Because mss is less than mtu, we can't have
the case mss > TD_MSS_MAX in the TSO path.
David S. Miller [Mon, 30 Mar 2020 18:44:01 +0000 (11:44 -0700)]
Merge branch 'Port-and-flow-policers-for-DSA'
Vladimir Oltean says:
====================
Port and flow policers for DSA (SJA1105, Felix/Ocelot)
This series adds support for 2 types of policers:
- port policers, via tc matchall filter
- flow policers, via tc flower filter
for 2 DSA drivers:
- sja1105
- felix/ocelot
First we start with ocelot/felix. Prior to this patch, the ocelot core
library currently only supported:
- Port policers
- Flow-based dropping and trapping
But the felix wrapper could not actually use the port policers due to
missing linkage and support in the DSA core. So one of the patches
addresses exactly that limitation by adding the missing support to the
DSA core. The other patch for felix flow policers (via the VCAP IS2
engine) is actually in the ocelot library itself, since the linkage with
the ocelot flower classifier has already been done in an earlier patch
set.
Then with the newly added .port_policer_add and .port_policer_del, we
can also start supporting the L2 policers on sja1105.
Then, for full functionality of these L2 policers on sja1105, we also
implement a more limited set of flow-based policing keys for this
switch, namely for broadcast and VLAN PCP.
Series version 1 was submitted here:
https://patchwork.ozlabs.org/cover/1263353/
Nothing functional changed in v2, only a rebase.
====================
Vladimir Oltean [Sun, 29 Mar 2020 11:52:02 +0000 (14:52 +0300)]
net: dsa: sja1105: add broadcast and per-traffic class policers
This patch adds complete support for manipulating the L2 Policing Tables
from this switch. There are 45 table entries, one entry per each port
and traffic class, and one dedicated entry for broadcast traffic for
each ingress port.
Policing entries are shareable, and we use this functionality to support
shared block filters.
We are modeling broadcast policers as simple tc-flower matches on
dst_mac. As for the traffic class policers, the switch only deduces the
traffic class from the VLAN PCP field, so it makes sense to model this
as a tc-flower match on vlan_prio.
How to limit broadcast traffic coming from all front-panel ports to a
cumulated total of 10 Mbit/s:
Vladimir Oltean [Sun, 29 Mar 2020 11:52:01 +0000 (14:52 +0300)]
net: dsa: sja1105: add configuration of port policers
This adds partial configuration support for the L2 Policing Table. Out
of the 45 policing entries, only 5 are used (one for each port), in a
shared manner. All 8 traffic classes, and the broadcast policer, are
redirected to a common instance which belongs to the ingress port.
Vladimir Oltean [Sun, 29 Mar 2020 11:52:00 +0000 (14:52 +0300)]
net: dsa: felix: add port policers
This patch is a trivial passthrough towards the ocelot library, which
support port policers since commit 2c1d029a017f ("net: mscc: ocelot:
Implement port policers via tc command").
Some data structure conversion between the DSA core and the Ocelot
library is necessary, for policer parameters.
Vladimir Oltean [Sun, 29 Mar 2020 11:51:59 +0000 (14:51 +0300)]
net: dsa: add port policers
The approach taken to pass the port policer methods on to drivers is
pragmatic. It is similar to the port mirroring implementation (in that
the DSA core does all of the filter block interaction and only passes
simple operations for the driver to implement) and dissimilar to how
flow-based policers are going to be implemented (where the driver has
full control over the flow_cls_offload data structure).
Xiaoliang Yang [Sun, 29 Mar 2020 11:51:57 +0000 (14:51 +0300)]
net: mscc: ocelot: add action of police on vcap_is2
Ocelot has 384 policers that can be allocated to ingress ports,
QoS classes per port, and VCAP IS2 entries. ocelot_police.c
supports to set policers which can be allocated to police action
of VCAP IS2. We allocate policers from maximum pol_id, and
decrease the pol_id when add a new vcap_is2 entry which is
police action.
Linus Torvalds [Mon, 30 Mar 2020 18:43:51 +0000 (11:43 -0700)]
Merge tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
- floppy driver cleanup series from Willy
- NVMe updates and fixes (Various)
- null_blk trace improvements (Chaitanya)
- bcache fixes (Coly)
- md fixes (via Song)
- loop block size change optimizations (Martijn)
- scnprintf() use (Takashi)
* tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits)
null_blk: add trace in null_blk_zoned.c
null_blk: add tracepoint helpers for zoned mode
block: add a zone condition debug helper
nvme: cleanup namespace identifier reporting in nvme_init_ns_head
nvme: rename __nvme_find_ns_head to nvme_find_ns_head
nvme: refactor nvme_identify_ns_descs error handling
nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl
nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl
nvme: Fix controller creation races with teardown flow
nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl
nvme: Fix ctrl use-after-free during sysfs deletion
nvme-pci: Re-order nvme_pci_free_ctrl
nvme: Remove unused return code from nvme_delete_ctrl_sync
nvme: Use nvme_state_terminal helper
nvme: release ida resources
nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO
nvmet-tcp: optimize tcp stack TX when data digest is used
nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow
nvme-multipath: do not reset on unknown status
nvmet-rdma: allocate RW ctxs according to mdts
...
David S. Miller [Mon, 30 Mar 2020 18:40:50 +0000 (11:40 -0700)]
Merge branch 'ionic-support-for-firmware-upgrade'
Shannon Nelson says:
====================
ionic support for firmware upgrade
The Pensando Distributed Services Card can get firmware upgrades from
the off-host centralized management suite, and can be upgraded without a
host reboot or driver reload. This patchset sets up the support for fw
upgrade in the Linux driver.
When the upgrade begins, the DSC first brings the link down, then stops
the firmware. The driver will notice this and quiesce itself by stopping
the queues and releasing DMA resources, then monitoring for firmware to
start back up. When the upgrade is finished the firmware is restarted
and link is brought up, and the driver rebuilds the queues and restarts
traffic flow.
First we separate the Link state from the netdev state, then reorganize a
few things to prepare for partial tear-down of the queues. Next we fix
up the state machine so that we take the Tx and Rx queues down and back
up when we get LINK_DOWN and LINK_UP events. Lastly, we add handling of
the FW reset itself by tearing down the lif internals and rebuilding them
with the new FW setup.
v2: This changes the design from (ab)using the full .ndo_stop and
.ndo_open routines to getting a better separation between the
alloc and the init functions so that we can keep our resource
allocations as long as possible.
====================
Shannon Nelson [Sat, 28 Mar 2020 03:14:48 +0000 (20:14 -0700)]
ionic: remove lifs on fw reset
When the FW RESET event comes to the driver from the firmware,
or the fw_status goes to 0 (stopped) or to 0xff (no PCI
connection), then shut down the driver activity. This event
signals a FW upgrade where we need to quiesce all operations and
wait for the FW to restart. The FW will continue the update
process once it sees all the LIFs are reset. When the update
process is done it will set the fw_status back to RUNNING.
Meanwhile, the heartbeat check continues and when the fw_status
is seen as set to running we can restart the driver operations.
Shannon Nelson [Sat, 28 Mar 2020 03:14:47 +0000 (20:14 -0700)]
ionic: disable the queues on link down
When the link goes down, we need to disable the queues on the
NIC in addition to stopping the netdev stack. This lets the
FW know that the driver has stopped queue activity, and then
the FW can do internal reconfiguration work, whether actually
Link related, or for other internal FW needs. To do this,
we pull out the queue enable and disable from ionic_open()
and ionic_stop() so they can be used by other routines.
To help keep things sane, we swap the queue enables so that
the rx queue and its napi are enabled before the tx queue
which rides on the rx queues napi.
We also drop the ionic_lif_quiesce() as it doesn't do anything
more than what the queue disable has already taken care of.
Shannon Nelson [Sat, 28 Mar 2020 03:14:43 +0000 (20:14 -0700)]
ionic: move debugfs add/delete to match alloc/free
Move the qcq debugfs add to the queue alloc, and likewise move
the debugfs delete to the queue free. The LIF debugfs add
also needs to be moved, but the del is already in the LIF free.
Shannon Nelson [Sat, 28 Mar 2020 03:14:41 +0000 (20:14 -0700)]
ionic: decouple link message from netdev state
Rearrange the link_up/link_down messages so that we announce
link up when we first notice that the link is up when the
driver loads, and decouple the link_up/link_down messages from
the UP and DOWN netdev state.
Ido Schimmel [Mon, 30 Mar 2020 18:08:20 +0000 (21:08 +0300)]
mlxsw: spectrum_ptp: Fix build warnings
Cited commit extended the enums 'hwtstamp_tx_types' and
'hwtstamp_rx_filters' with values that were not accounted for in the
switch statements, resulting in the build warnings below.
Fix by adding a default case.
drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c: In function ‘mlxsw_sp_ptp_get_message_types’:
drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:915:2: warning: enumeration value ‘__HWTSTAMP_TX_CNT’ not handled in switch [-Wswitch]
915 | switch (tx_type) {
| ^~~~~~
drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:927:2: warning: enumeration value ‘__HWTSTAMP_FILTER_CNT’ not handled in switch [-Wswitch]
927 | switch (rx_filter) {
| ^~~~~~
Fixes: f76510b458a5 ("ethtool: add timestamping related string sets") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: David S. Miller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Mon, 30 Mar 2020 18:20:13 +0000 (11:20 -0700)]
Merge tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
- Online capacity resizing (Balbir)
- Number of hardware queue change fixes (Bart)
- null_blk fault injection addition (Bart)
- Cleanup of queue allocation, unifying the node/no-node API
(Christoph)
- Cleanup of genhd, moving code to where it makes sense (Christoph)
- Cleanup of the partition handling code (Christoph)
- disk stat fixes/improvements (Konstantin)
- BFQ improvements (Paolo)
- Various fixes and improvements
* tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block: (72 commits)
block: return NULL in blk_alloc_queue() on error
block: move bio_map_* to blk-map.c
Revert "blkdev: check for valid request queue before issuing flush"
block: simplify queue allocation
bcache: pass the make_request methods to blk_queue_make_request
null_blk: use blk_mq_init_queue_data
block: add a blk_mq_init_queue_data helper
block: move the ->devnode callback to struct block_device_operations
block: move the part_stat* helpers from genhd.h to a new header
block: move block layer internals out of include/linux/genhd.h
block: move guard_bio_eod to bio.c
block: unexport get_gendisk
block: unexport disk_map_sector_rcu
block: unexport disk_get_part
block: mark part_in_flight and part_in_flight_rw static
block: mark block_depr static
block: factor out requeue handling from dispatch code
block/diskstats: replace time_in_queue with sum of request times
block/diskstats: accumulate all per-cpu counters in one pass
block/diskstats: more accurate approximation of io_ticks for slow disks
...
====================
Devlink health auto attributes refactor
This patchset refactors the auto-recover health reporter flag to be
explicitly set by the devlink core.
In addition, add another flag to control auto-dump attribute, also
to be explicitly set by the devlink core.
For that, patch 0001 changes the auto-recover default value of
netdevsim dummy reporter.
After reporter registration, both flags can be altered be administrator
only.
Changes since v1:
- Change default behaviour of netdevsim dummy reporter
- Move initialization of DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP
====================
Eran Ben Elisha [Sun, 29 Mar 2020 11:05:55 +0000 (14:05 +0300)]
devlink: Add auto dump flag to health reporter
On low memory system, run time dumps can consume too much memory. Add
administrator ability to disable auto dumps per reporter as part of the
error flow handle routine.
This attribute is not relevant while executing
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.
By default, auto dump is activated for any reporter that has a dump method,
as part of the reporter registration to devlink.
Eran Ben Elisha [Sun, 29 Mar 2020 11:05:54 +0000 (14:05 +0300)]
devlink: Implicitly set auto recover flag when registering health reporter
When health reporter is registered to devlink, devlink will implicitly set
auto recover if and only if the reporter has a recover method. No reason
to explicitly get the auto recover flag from the driver.
Remove this flag from all drivers that called
devlink_health_reporter_create.
All existing health reporters set auto recovery to true if they have a
recover method.
Yet, administrator can unset auto recover via netlink command as prior to
this patch.
Eran Ben Elisha [Sun, 29 Mar 2020 11:05:53 +0000 (14:05 +0300)]
netdevsim: Change dummy reporter auto recover default
Health reporters should be registered with auto recover set to true.
Align dummy reporter behaviour with that, as in later patch the option to
set auto recover behaviour will be removed.
In addition, align netdevsim selftest to the new default value.
Richard Cochran [Sun, 29 Mar 2020 14:55:10 +0000 (07:55 -0700)]
ptp: Avoid deadlocks in the programmable pin code.
The PTP Hardware Clock (PHC) subsystem offers an API for configuring
programmable pins. User space sets or gets the settings using ioctls,
and drivers verify dialed settings via a callback. Drivers may also
query pin settings by calling the ptp_find_pin() method.
Although the core subsystem protects concurrent access to the pin
settings, the implementation places illogical restrictions on how
drivers may call ptp_find_pin(). When enabling an auxiliary function
via the .enable(on=1) callback, drivers may invoke the pin finding
method, but when disabling with .enable(on=0) drivers are not
permitted to do so. With the exception of the mv88e6xxx, all of the
PHC drivers do respect this restriction, but still the locking pattern
is both confusing and unnecessary.
This patch changes the locking implementation to allow PHC drivers to
freely call ptp_find_pin() from their .enable() and .verify()
callbacks.
V2 ChangeLog:
- fixed spelling in the kernel doc
- add Vladimir's tested by tag
Linus Torvalds [Mon, 30 Mar 2020 18:10:08 +0000 (11:10 -0700)]
Merge tag 'for-5.7/libata-2020-03-29' of git://git.kernel.dk/linux-block
Pull libata updates from Jens Axboe:
- Series from Bart, making the libata code smaller on PATA only setups.
This is useful for smaller/embedded use cases, and will help us move
some of those off drivers/ide.
- Kill unused BPRINTK() (Hannes)
- Add various Comet Lake ahci PCI ids (Kai-Heng, Mika)
- Fix for a double scsi_host_put() in error handling (John)
- Use scnprintf (Takashi)
- Assign OF node to the SCSI device (Linus Walleij)
* tag 'for-5.7/libata-2020-03-29' of git://git.kernel.dk/linux-block: (36 commits)
ata: make "libata.force" kernel parameter optional
ata: move ata_eh_analyze_ncq_error() & co. to libata-sata.c
ata: start separating SATA specific code from libata-eh.c
ata: move ata_sas_*() to libata-sata.c
ata: start separating SATA specific code from libata-scsi.c
ata: move sata_deb_timing_*() to libata-sata.c
ata: move ata_qc_complete_multiple() to libata-sata.c
ata: move sata_link_hardreset() to libata-sata.c
ata: move sata_link_{debounce,resume}() to libata-sata.c
ata: move *sata_set_spd*() to libata-sata.c
ata: move sata_scr_*() to libata-sata.c
ata: start separating SATA specific code from libata-core.c
ata: let compiler optimize out ata_eh_set_lpm() on non-SATA hosts
ata: let compiler optimize out ata_dev_config_ncq() on non-SATA hosts
ata: add CONFIG_SATA_HOST=n version of ata_ncq_enabled()
ata: separate PATA timings code from libata-core.c
ata: fix CodingStyle issues in PATA timings code
ata: remove EXPORT_SYMBOL_GPL()s not used by modules
ata: move EXPORT_SYMBOL_GPL()s close to exported code
ata: optimize ata_scsi_rbuf[] size
...
Jiri Pirko [Sat, 28 Mar 2020 18:25:29 +0000 (19:25 +0100)]
net: devlink: use NL_SET_ERR_MSG_MOD instead of NL_SET_ERR_MSG
The rest of the devlink code sets the extack message using
NL_SET_ERR_MSG_MOD. Change the existing appearances of NL_SET_ERR_MSG
to NL_SET_ERR_MSG_MOD.
Jiri Pirko [Sat, 28 Mar 2020 15:37:43 +0000 (16:37 +0100)]
net: sched: expose HW stats types per action used by drivers
It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.
$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 192.168.1.1
in_hw in_hw_count 2
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 10 sec used 10 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Linus Torvalds [Mon, 30 Mar 2020 18:03:19 +0000 (11:03 -0700)]
Merge tag 'i3c/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Boris Brezillon:
- Fix driver auto-probing related issues
- Stop using the deprecated i2c_new_device() function
- Replace zero-length array with flexible-array member
* tag 'i3c/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: convert to use i2c_new_client_device()
i3c: master: Replace zero-length array with flexible-array member
i3c: Simplify i3c_device_match_id()
i3c: Generate aliases for i3c modules
i3c: Add a modalias sysfs attribute
i3c: Fix MODALIAS uevents
i3c: master: no need to iterate master device twice
Guangbin Huang [Sat, 28 Mar 2020 07:09:58 +0000 (15:09 +0800)]
net: hns3: fix set and get link ksettings issue
When device is not open, the service task which update the port
information per second is not running. In this case, the port
capabilities, including speed ability, autoneg ability, media type,
may be incorrect. Then get/set link ksetting may fail.
This patch fixes it by updating the port information before getting/
setting link ksettings when device is not open, and start timer
task immediately by setting delay time to 0 when device opens.
Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Guojia Liao [Sat, 28 Mar 2020 07:09:57 +0000 (15:09 +0800)]
net: hns3: fix RSS config lost after VF reset.
Currently, VF's RSS configuration would be set to default
after VF reset, the the user's one will loss.
To fix it, this patch separates hclgevf_rss_init_hw() into
two parts, one sets up the default RSS configuration and
just be called when driver loading, one configures the hardware
and be called by driver loading or reset.
Fixes: d97b30721301 ("net: hns3: Add RSS tuples support for VF") Signed-off-by: Guojia Liao <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Huazhong Tan [Sat, 28 Mar 2020 07:09:56 +0000 (15:09 +0800)]
net: hns3: fix for fraglist SKB headlen not handling correctly
When the fraglist SKB headlen is larger than zero, current code
still handle the fraglist SKB linear data as frag data, which may
cause TX error.
This patch adds a new DESC_TYPE_FRAGLIST_SKB type to handle the
mapping and unmapping of the fraglist SKB linear data buffer.
Fixes: 8ae10cfb5089 ("net: hns3: support tx-scatter-gather-fraglist feature") Signed-off-by: Yunsheng Lin <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Yunsheng Lin [Sat, 28 Mar 2020 07:09:55 +0000 (15:09 +0800)]
net: hns3: drop the WQ_MEM_RECLAIM flag when allocating WQ
The WQ in hns3 driver is allocated with WQ_MEM_RECLAIM flag
in order to guarantee forward progress, which may cause hns3'
WQ_MEM_RECLAIM WQ flushing infiniband' !WQ_MEM_RECLAIM WQ
warning:
There seems to be no specific guidance about how to handling the
forward progress guarantee of network device's WQ yet, and other
network device's WQ seem to be marked with WQ_MEM_RECLAIM without
a clear reason.
So this patch removes the WQ_MEM_RECLAIM flag when allocating WQ
to aviod the above warning.
Fixes: 0ea68902256e ("net: hns3: allocate WQ with WQ_MEM_RECLAIM flag") Signed-off-by: Yunsheng Lin <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Mon, 30 Mar 2020 17:57:32 +0000 (10:57 -0700)]
Merge tag 'tpmdd-next-20200316' of git://git.infradead.org/users/jjs/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"tpmdd updates for Linux v5.7"
* tag 'tpmdd-next-20200316' of git://git.infradead.org/users/jjs/linux-tpmdd:
KEYS: reaching the keys quotas correctly
tpm: ibmvtpm: Add support for TPM2
tpm: ibmvtpm: Wait for buffer to be set before proceeding
tpm: of: Handle IBM,vtpm20 case when getting log parameters
MAINTAINERS: adjust to trusted keys subsystem creation
tpm: tpm_tis_spi_cr50: use new structure for SPI transfer delays
tpm_tis_spi: use new 'delay' structure for SPI transfer delays
tpm: tpm2_bios_measurements_next should increase position index
tpm: tpm1_bios_measurements_next should increase position index
tpm: Don't make log failures fatal
There is no point in preparing the module name in a buffer. The format
string can be passed diectly to 'request_module()'.
This axes a few lines of code and cleans a few things:
- max len for a driver name is MODULE_NAME_LEN wich is ~ 60 chars,
not 128. It would be down-sized in 'request_module()'
- we should pass the total size of the buffer to 'snprintf()', not the
size minus 1