Git Repo - linux.git/log

mptcp: provide rmem[0] limit

The mptcp proto struct currently does not provide the
required limit for forward memory scheduling. Under
pressure sk_rmem_schedule() will unconditionally try
to use such field and will oops.

Address the issue inheriting the tcp limit, as we already
do for the wmem one.

Fixes: 9c3f94e1681b ("mptcp: add missing memory scheduling in the rx path")
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Link: https://lore.kernel.org/r/37af798bd46f402fb7c79f57ebbdd00614f5d7fa.1604861097.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>

docs: networking: phy: s/2.5 times faster/2.5 times as fast/

2.5 times faster would be 3.5 Gbps (4.375 Gbaud after 8b/10b encoding).

Signed-off-by: Jonathan Neuschäfer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ethtool: netlink: add missing netdev_features_change() call

After updating userspace Ethtool from 5.7 to 5.9, I noticed that
NETDEV_FEAT_CHANGE is no more raised when changing netdev features
through Ethtool.
That's because the old Ethtool ioctl interface always calls
netdev_features_change() at the end of user request processing to
inform the kernel that our netdevice has some features changed, but
the new Netlink interface does not. Instead, it just notifies itself
with ETHTOOL_MSG_FEATURES_NTF.
Replace this ethtool_notify() call with netdev_features_change(), so
the kernel will be aware of any features changes, just like in case
with the ioctl interface. This does not omit Ethtool notifications,
as Ethtool itself listens to NETDEV_FEAT_CHANGE and drops
ETHTOOL_MSG_FEATURES_NTF on it
(net/ethtool/netlink.c:ethnl_netdev_event()).

From v1 [1]:
- dropped extra new line as advised by Jakub;
- no functional changes.

[1] https://lore.kernel.org/netdev/[email protected]

Fixes: 0980bfcd6954 ("ethtool: set netdev features with FEATURES_SET request")
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Kubecek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tunnels: Fix off-by-one in lower MTU bounds for ICMP/ICMPv6 replies

Jianlin reports that a bridged IPv6 VXLAN endpoint, carrying IPv6
packets over a link with a PMTU estimation of exactly 1350 bytes,
won't trigger ICMPv6 Packet Too Big replies when the encapsulated
datagrams exceed said PMTU value. VXLAN over IPv6 adds 70 bytes of
overhead, so an ICMPv6 reply indicating 1280 bytes as inner MTU
would be legitimate and expected.

This comes from an off-by-one error I introduced in checks added
as part of commit 4cb47a8644cc ("tunnels: PMTU discovery support
for directly bridged IP packets"), whose purpose was to prevent
sending ICMPv6 Packet Too Big messages with an MTU lower than the
smallest permissible IPv6 link MTU, i.e. 1280 bytes.

In iptunnel_pmtud_check_icmpv6(), avoid triggering a reply only if
the advertised MTU would be less than, and not equal to, 1280 bytes.

Also fix the analogous comparison for IPv4, that is, skip the ICMP
reply only if the resulting MTU is strictly less than 576 bytes.

This becomes apparent while running the net/pmtu.sh bridged VXLAN
or GENEVE selftests with adjusted lower-link MTU values. Using
e.g. GENEVE, setting ll_mtu to the values reported below, in the
test_pmtu_ipvX_over_bridged_vxlanY_or_geneveY_exception() test
function, we can see failures on the following tests:

             test                | ll_mtu
  -------------------------------|--------
  pmtu_ipv4_br_geneve4_exception |   626
  pmtu_ipv6_br_geneve4_exception |  1330
  pmtu_ipv6_br_geneve6_exception |  1350

owing to the different tunneling overheads implied by the
corresponding configurations.

Reported-by: Jianlin Shi <[email protected]>
Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Stefano Brivio <[email protected]>
Link: https://lore.kernel.org/r/4f5fc2f33bfdf8409549fafd4f952b008bf04d63.1604681709.git.sbrivio@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>

IPv6: Set SIT tunnel hard_header_len to zero

Due to the legacy usage of hard_header_len for SIT tunnels while
already using infrastructure from net/ipv4/ip_tunnel.c the
calculation of the path MTU in tnl_update_pmtu is incorrect.
This leads to unnecessary creation of MTU exceptions for any
flow going over a SIT tunnel.

As SIT tunnels do not have a header themsevles other than their
transport (L3, L2) headers we're leaving hard_header_len set to zero
as tnl_update_pmtu is already taking care of the transport headers
sizes.

This will also help avoiding unnecessary IPv6 GC runs and spinlock
contention seen when using SIT tunnels and for more than
net.ipv6.route.gc_thresh flows.

Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Oliver Herms <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/20201103104133.GA1573211@tws
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM:
   - fix compilation error when PMD and PUD are folded
   - fix regression in reads-as-zero behaviour of ID_AA64ZFR0_EL1
   - add aarch64 get-reg-list test

  x86:
   - fix semantic conflict between two series merged for 5.10
   - fix (and test) enforcement of paravirtual cpuid features

  selftests:
   - various cleanups to memory management selftests
   - new selftests testcase for performance of dirty logging"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits)
  KVM: selftests: allow two iterations of dirty_log_perf_test
  KVM: selftests: Introduce the dirty log perf test
  KVM: selftests: Make the number of vcpus global
  KVM: selftests: Make the per vcpu memory size global
  KVM: selftests: Drop pointless vm_create wrapper
  KVM: selftests: Add wrfract to common guest code
  KVM: selftests: Simplify demand_paging_test with timespec_diff_now
  KVM: selftests: Remove address rounding in guest code
  KVM: selftests: Factor code out of demand_paging_test
  KVM: selftests: Use a single binary for dirty/clear log test
  KVM: selftests: Always clear dirty bitmap after iteration
  KVM: selftests: Add blessed SVE registers to get-reg-list
  KVM: selftests: Add aarch64 get-reg-list test
  selftests: kvm: test enforcement of paravirtual cpuid features
  selftests: kvm: Add exception handling to selftests
  selftests: kvm: Clear uc so UCALL_NONE is being properly reported
  selftests: kvm: Fix the segment descriptor layout to match the actual layout
  KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrs
  kvm: x86: request masterclock update any time guest uses different msr
  kvm: x86: ensure pv_cpuid.features is initialized when enabling cap
  ...

Merge tag 'nfsd-5.10-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
"This is mainly server-to-server copy and fallout from Chuck's 5.10 rpc
  refactoring"

* tag 'nfsd-5.10-1' of git://linux-nfs.org/~bfields/linux:
  net/sunrpc: fix useless comparison in proc_do_xprt()
  net/sunrpc: return 0 on attempt to write to "transports"
  NFSD: fix missing refcount in nfsd4_copy by nfsd4_do_async_copy
  NFSD: Fix use-after-free warning when doing inter-server copy
  NFSD: MKNOD should return NFSERR_BADTYPE instead of NFSERR_INVAL
  SUNRPC: Fix general protection fault in trace_rpc_xdr_overflow()
  NFSD: NFSv3 PATHCONF Reply is improperly formed

Merge tag 'ext4_for_linus_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes and cleanups from Ted Ts'o:
"More fixes and cleanups for the new fast_commit features, but also a
  few other miscellaneous bug fixes and a cleanup for the MAINTAINERS
  file"

* tag 'ext4_for_linus_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (28 commits)
  jbd2: fix up sparse warnings in checkpoint code
  ext4: fix sparse warnings in fast_commit code
  ext4: cleanup fast commit mount options
  jbd2: don't start fast commit on aborted journal
  ext4: make s_mount_flags modifications atomic
  ext4: issue fsdev cache flush before starting fast commit
  ext4: disable fast commit with data journalling
  ext4: fix inode dirty check in case of fast commits
  ext4: remove unnecessary fast commit calls from ext4_file_mmap
  ext4: mark buf dirty before submitting fast commit buffer
  ext4: fix code documentatioon
  ext4: dedpulicate the code to wait on inode that's being committed
  jbd2: don't read journal->j_commit_sequence without taking a lock
  jbd2: don't touch buffer state until it is filled
  jbd2: add todo for a fast commit performance optimization
  jbd2: don't pass tid to jbd2_fc_end_commit_fallback()
  jbd2: don't use state lock during commit path
  jbd2: drop jbd2_fc_init documentation
  ext4: clean up the JBD2 API that initializes fast commits
  jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufs
  ...

Merge tag 'erofs-for-5.10-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
"A week ago, Vladimir reported an issue that the kernel log would
  become polluted if the page allocation debug option is enabled. I also
  found this when I cleaned up magical page->mapping and originally
  planned to submit these all for 5.11 but it seems the impact can be
  noticed so submit the fix in advance.

  In addition, nl6720 also reported that atime is empty although EROFS
  has the only one on-disk timestamp as a practical consideration for
  now but it's better to derive it as what we did for the other
  timestamps.

  Summary:

   - fix setting up pcluster improperly for temporary pages

   - derive atime instead of leaving it empty"

* tag 'erofs-for-5.10-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix setting up pcluster for temporary pages
  erofs: derive atime instead of leaving it empty

ACPI: button: Add DMI quirk for Medion Akoya E2228T

The Medion Akoya E2228T's ACPI _LID implementation is quite broken,
it has the same issues as the one from the Medion Akoya E2215T:

1. For notifications it uses an ActiveLow Edge GpioInt, rather then
an ActiveBoth one, meaning that the device is only notified when the
lid is closed, not when it is opened.

2. Matching with this its _LID method simply always returns 0 (closed)

In order for the Linux LID code to work properly with this implementation,
the lid_init_state selection needs to be set to ACPI_BUTTON_LID_INIT_OPEN,
add a DMI quirk for this.

While working on this I also found out that the MD60### part of the model
number differs per country/batch while all of the E2215T and E2228T models
have this issue, so also remove the " MD60198" part from the E2215T quirk.

Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

ACPI: GED: fix -Wformat

Clang is more aggressive about -Wformat warnings when the format flag
specifies a type smaller than the parameter. It turns out that gsi is an
int. Fixes:

drivers/acpi/evged.c:105:48: warning: format specifies type 'unsigned
char' but the argument has type 'unsigned int' [-Wformat]
trigger == ACPI_EDGE_SENSITIVE ? 'E' : 'L', gsi);
^~~

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Fixes: ea6f3af4c5e6 ("ACPI: GED: add support for _Exx / _Lxx handler methods")
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

ACPI: Fix whitespace inconsistencies

Replaces spaces with tabs where spaces have been (inconsistently) used
for indentation and removes trailing whitespaces.

Signed-off-by: Maximilian Luz <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

ACPI: scan: Fix acpi_dma_configure_id() kerneldoc name

For some reason building with W=1 doesn't pick up on this, but the
kerneldoc name for acpi_dma_configure_id() is not right, so fix it up.

Signed-off-by: John Garry <[email protected]>
Acked-by: Lorenzo Pieralisi <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Documentation: firmware-guide: gpio-properties: Clarify initial output state

GpioIo() doesn't provide an explicit state for an output pin.
Linux tries to be smart and uses a common sense based on other
parameters. Document how it looks like in the code.

Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Mika Westerberg <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Documentation: firmware-guide: gpio-properties: active_low only for GpioIo()

It appears that people may misinterpret active_low field in _DSD
for GpioInt() resource. Add a paragraph to clarify this.

Reported-by: Ricardo Ribalda <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Mika Westerberg <[email protected]>
Reviewed-by: Ricardo Ribalda <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Documentation: firmware-guide: gpio-properties: Fix factual mistakes

Fix factual mistakes and style issues in GPIO properties document.
This converts IoRestriction from InputOnly to OutputOnly as pins
in the example are used as outputs.

Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

KVM: selftests: allow two iterations of dirty_log_perf_test

Even though one iteration is not enough for the dirty log performance
test (due to the cost of building page tables, zeroing memory etc.)
two is okay and it is the default. Without this patch,
"./dirty_log_perf_test" without any further arguments fails.

Cc: Ben Gardon <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Linux 5.10-rc3

net/sunrpc: fix useless comparison in proc_do_xprt()

In the original code, the "if (*lenp < 0)" check didn't work because
"*lenp" is unsigned. Fortunately, the memory_read_from_buffer() call
will never fail in this context so it doesn't affect runtime.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>

Merge tag 'driver-core-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core documentation fixes from Greg KH:
"Some small Documentation fixes that were fallout from the larger
  documentation update we did in 5.10-rc2.

  Nothing major here at all, but all of these have been in linux-next
  and resolve build warnings when building the documentation files"

* tag 'driver-core-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Documentation: remove mic/index from misc-devices/index.rst
  scripts: get_api.pl: Add sub-titles to ABI output
  scripts: get_abi.pl: Don't let ABI files to create subtitles
  docs: leds: index.rst: add a missing file
  docs: ABI: sysfs-class-net: fix a typo
  docs: ABI: sysfs-driver-dma-ioatdma: what starts with /sys

Merge tag 'tty-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
"Here are a small number of small tty and serial fixes for some
  reported problems for the tty core, vt code, and some serial drivers.

  They include fixes for:

   - a buggy and obsolete vt font ioctl removal

   - 8250_mtk serial baudrate runtime warnings

   - imx serial earlycon build configuration fix

   - txx9 serial driver error path cleanup issues

   - tty core fix in release_tty that can be triggered by trying to bind
     an invalid serial port name to a speakup console device

  Almost all of these have been in linux-next without any problems, the
  only one that hasn't, just deletes code :)"

* tag 'tty-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  vt: Disable KD_FONT_OP_COPY
  tty: fix crash in release_tty if tty->port is not set
  serial: txx9: add missing platform_driver_unregister() on error in serial_txx9_init
  tty: serial: imx: enable earlycon by default if IMX_SERIAL_CONSOLE is enabled
  serial: 8250_mtk: Fix uart_get_baud_rate warning

Merge tag 'usb-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes and new device ids:

   - USB gadget fixes for some reported issues

   - Fixes for the ever-troublesome apple fastcharge driver, hopefully
     we finally have it right.

   - More USB core quirks for odd devices

   - USB serial driver fixes for some long-standing issues that were
     recently found

   - some new USB serial driver device ids

  All have been in linux-next with no reported issues"

* tag 'usb-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: apple-mfi-fastcharge: fix reference leak in apple_mfi_fc_set_property
  usb: mtu3: fix panic in mtu3_gadget_stop()
  USB: serial: option: add Telit FN980 composition 0x1055
  USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231
  USB: serial: cyberjack: fix write-URB completion race
  USB: Add NO_LPM quirk for Kingston flash drive
  USB: serial: option: add Quectel EC200T module support
  usb: raw-gadget: fix memory leak in gadget_setup
  usb: dwc2: Avoid leaving the error_debugfs label unused
  usb: dwc3: ep0: Fix delay status handling
  usb: gadget: fsl: fix null pointer checking
  usb: gadget: goku_udc: fix potential crashes in probe
  usb: dwc3: pci: add support for the Intel Alder Lake-S

fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent

current->group_leader->exit_signal may change during copy_process() if
current->real_parent exits.

Move the assignment inside tasklist_lock to avoid the race.

Signed-off-by: Eddy Wu <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

vt: Disable KD_FONT_OP_COPY

It's buggy:

On Fri, Nov 06, 2020 at 10:30:08PM +0800, Minh Yuan wrote:
> We recently discovered a slab-out-of-bounds read in fbcon in the latest
> kernel ( v5.10-rc2 for now ).  The root cause of this vulnerability is that
> "fbcon_do_set_font" did not handle "vc->vc_font.data" and
> "vc->vc_font.height" correctly, and the patch
> <https://lkml.org/lkml/2020/9/27/223> for VT_RESIZEX can't handle this
> issue.
>
> Specifically, we use KD_FONT_OP_SET to set a small font.data for tty6, and
> use  KD_FONT_OP_SET again to set a large font.height for tty1. After that,
> we use KD_FONT_OP_COPY to assign tty6's vc_font.data to tty1's vc_font.data
> in "fbcon_do_set_font", while tty1 retains the original larger
> height. Obviously, this will cause an out-of-bounds read, because we can
> access a smaller vc_font.data with a larger vc_font.height.

Further there was only one user ever.
- Android's loadfont, busybox and console-tools only ever use OP_GET
  and OP_SET
- fbset documentation only mentions the kernel cmdline font: option,
  not anything else.
- systemd used OP_COPY before release 232 published in Nov 2016

Now unfortunately the crucial report seems to have gone down with
gmane, and the commit message doesn't say much. But the pull request
hints at OP_COPY being broken

https://github.com/systemd/systemd/pull/3651

So in other words, this never worked, and the only project which
foolishly every tried to use it, realized that rather quickly too.

Instead of trying to fix security issues here on dead code by adding
missing checks, fix the entire thing by removing the functionality.

Note that systemd code using the OP_COPY function ignored the return
value, so it doesn't matter what we're doing here really - just in
case a lone server somewhere happens to be extremely unlucky and
running an affected old version of systemd. The relevant code from
font_copy_to_all_vcs() in systemd was:

/* copy font from active VT, where the font was uploaded to */
cfo.op = KD_FONT_OP_COPY;
cfo.height = vcs.v_active-1; /* tty1 == index 0 */
(void) ioctl(vcfd, KDFONTOP, &cfo);

Note this just disables the ioctl, garbage collecting the now unused
callbacks is left for -next.

v2: Tetsuo found the old mail, which allowed me to find it on another
archive. Add the link too.

Acked-by: Peilin Ye <[email protected]>
Reported-by: Minh Yuan <[email protected]>
References: https://lists.freedesktop.org/archives/systemd-devel/2016-June/036935.html
References: https://github.com/systemd/systemd/pull/3651
Cc: Greg KH <[email protected]>
Cc: Peilin Ye <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge tag 'xfs-5.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:

- Fix an uninitialized struct problem

- Fix an iomap problem zeroing unwritten EOF blocks

- Fix some clumsy error handling when writeback fails on filesystems
   with blocksize < pagesize

- Fix a retry loop not resetting loop variables properly

- Fix scrub flagging rtinherit inodes on a non-rt fs, since the kernel
   actually does permit that combination

- Fix excessive page cache flushing when unsharing part of a file

* tag 'xfs-5.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: only flush the unshared range in xfs_reflink_unshare
  xfs: fix scrub flagging rtinherit even if there is no rt device
  xfs: fix missing CoW blocks writeback conversion retry
  iomap: clean up writeback state logic on writepage error
  iomap: support partial page discard on writeback block mapping failure
  xfs: flush new eof page on truncate to avoid post-eof corruption
  xfs: set xefi_discard when creating a deferred agfl free log intent item

Merge branch 'hch' (patches from Christoph)

Merge procfs splice read fixes from Christoph Hellwig:
"Greg reported a problem due to the fact that Android tests use procfs
  files to test splice, which stopped working with the changes for
  set_fs() removal.

  This series adds read_iter support for seq_file, and uses those for
  various proc files using seq_file to restore splice read support"

[ Side note: Christoph initially had a scripted "move everything over"
  patch, which looks fine, but I personally would prefer us to actively
  discourage splice() on random files.  So this does just the minimal
  basic core set of proc file op conversions.

  For completeness, and in case people care, that script was

     sed -i -e 's/\.proc_read$\s*=\s*$seq_read/\.proc_read_iter\1seq_read_iter/g'

  but I'll wait and see if somebody has a strong argument for using
  splice on random small /proc files before I'd run it on the whole
  kernel.   - Linus ]

* emailed patches from Christoph Hellwig <[email protected]>:
  proc "seq files": switch to ->read_iter
  proc "single files": switch to ->read_iter
  proc/stat: switch to ->read_iter
  proc/cpuinfo: switch to ->read_iter
  proc: wire up generic_file_splice_read for iter ops
  seq_file: add seq_read_iter

Merge tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes:

   - Use SYM_FUNC_START_WEAK in the mem* ASM functions instead of a
     combination of .weak and SYM_FUNC_START_LOCAL which makes LLVMs
     integrated assembler upset

   - Correct the mitigation selection logic which prevented the related
     prctl to work correctly

   - Make the UV5 hubless system work correctly by fixing up the
     malformed table entries and adding the missing ones"

* tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/uv: Recognize UV5 hubless system identifier
  x86/platform/uv: Remove spaces from OEM IDs
  x86/platform/uv: Fix missing OEM_TABLE_ID
  x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
  x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S

Merge tag 'perf-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Thomas Gleixner:
"A single fix for the perf core plugging a memory leak in the address
filter parser"

* tag 'perf-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix a memory leak in perf_event_parse_addr_filter()

Merge tag 'locking-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull futex fix from Thomas Gleixner:
"A single fix for the futex code where an intermediate state in the
  underlying RT mutex was not handled correctly and triggering a BUG()
  instead of treating it as another variant of retry condition"

* tag 'locking-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Handle transient "ownerless" rtmutex state correctly

Merge tag 'irq-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:

   - Fix the fallout of the IPI as interrupt conversion in Kconfig and
     the BCM2836 interrupt chip driver

   - Fixes for interrupt affinity setting and the handling of
     hierarchical irq domains in the SiFive PLIC driver

   - Make the unmapped event handling in the TI SCI driver work
     correctly

   - A few minor fixes and cleanups in various chip drivers and Kconfig"

* tag 'irq-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  dt-bindings: irqchip: ti, sci-inta: Fix diagram indentation for unmapped events
  irqchip/ti-sci-inta: Add support for unmapped event handling
  dt-bindings: irqchip: ti, sci-inta: Update for unmapped event handling
  irqchip/renesas-intc-irqpin: Merge irlm_bit and needs_irlm
  irqchip/sifive-plic: Fix chip_data access within a hierarchy
  irqchip/sifive-plic: Fix broken irq_set_affinity() callback
  irqchip/stm32-exti: Add all LP timer exti direct events support
  irqchip/bcm2836: Fix missing __init annotation
  irqchip/mips: Drop selection of IRQ_DOMAIN_HIERARCHY
  irqchip/mst: Make mst_intc_of_init static
  irqchip/mst: MST_IRQ should depend on ARCH_MEDIATEK or ARCH_MSTARV7
  genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY

Merge tag 'core-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull entry code fix from Thomas Gleixner:
"A single fix for the generic entry code to correct the wrong
  assumption that the lockdep interrupt state needs not to be
  established before calling the RCU check"

* tag 'core-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  entry: Fix the incorrect ordering of lockdep and RCU check

Merge tag 'powerpc-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

- fix miscompilation with GCC 4.9 by using asm_goto_volatile for put_user()

- fix for an RCU splat at boot caused by a recent lockdep change

- fix for a possible deadlock in our EEH debugfs code

- several fixes for handling of _PAGE_ACCESSED on 32-bit platforms

- build fix when CONFIG_NUMA=n

Thanks to Andreas Schwab, Christophe Leroy, Oliver O'Halloran, Qian Cai,
and Scott Cheloha.

* tag 'powerpc-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/numa: Fix build when CONFIG_NUMA=n
  powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entry
  powerpc/8xx: Always fault when _PAGE_ACCESSED is not set
  powerpc/40x: Always fault when _PAGE_ACCESSED is not set
  powerpc/603: Always fault when _PAGE_ACCESSED is not set
  powerpc: Use asm_goto_volatile for put_user()
  powerpc/smp: Call rcu_cpu_starting() earlier
  powerpc/eeh_cache: Fix a possible debugfs deadlock

KVM: selftests: Introduce the dirty log perf test

The dirty log perf test will time verious dirty logging operations
(enabling dirty logging, dirtying memory, getting the dirty log,
clearing the dirty log, and disabling dirty logging) in order to
quantify dirty logging performance. This test can be used to inform
future performance improvements to KVM's dirty logging infrastructure.

This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.

Signed-off-by: Ben Gardon <[email protected]>
Message-Id: <20201027233733.1484855 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Make the number of vcpus global

We also check the input number of vcpus against the maximum supported.

Signed-off-by: Andrew Jones <[email protected]>
Message-Id: <20201104212357 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Make the per vcpu memory size global

Rename vcpu_memory_bytes to something with "percpu" in it
in order to be less ambiguous. Also make it global to
simplify things.

Signed-off-by: Andrew Jones <[email protected]>
Message-Id: <20201104212357 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Drop pointless vm_create wrapper

Signed-off-by: Andrew Jones <[email protected]>
Message-Id: <20201104212357 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Add wrfract to common guest code

Wrfract will be used by the dirty logging perf test introduced later in
this series to dirty memory sparsely.

This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.

Signed-off-by: Ben Gardon <[email protected]>
Message-Id: <20201027233733.1484855 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Simplify demand_paging_test with timespec_diff_now

Add a helper function to get the current time and return the time since
a given start time. Use that function to simplify the timekeeping in the
demand paging test.

This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.

Signed-off-by: Ben Gardon <[email protected]>
Message-Id: <20201027233733.1484855 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Remove address rounding in guest code

Rounding the address the guest writes to a host page boundary
will only have an effect if the host page size is larger than the guest
page size, but in that case the guest write would still go to the same
host page. There's no reason to round the address down, so remove the
rounding to simplify the demand paging test.

This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.

Signed-off-by: Ben Gardon <[email protected]>
Message-Id: <20201027233733.1484855 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Factor code out of demand_paging_test

Much of the code in demand_paging_test can be reused by other, similar
multi-vCPU-memory-touching-perfromance-tests. Factor that common code
out for reuse.

No functional change expected.

This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.

Signed-off-by: Ben Gardon <[email protected]>
Message-Id: <20201027233733.1484855 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Use a single binary for dirty/clear log test

Remove the clear_dirty_log test, instead merge it into the existing
dirty_log_test. It should be cleaner to use this single binary to do
both tests, also it's a preparation for the upcoming dirty ring test.

The default behavior will run all the modes in sequence.

Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20201001012233 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Always clear dirty bitmap after iteration

We used not to clear the dirty bitmap before because KVM_GET_DIRTY_LOG
would overwrite it the next time it copies the dirty log onto it.
In the upcoming dirty ring tests we'll start to fetch dirty pages from
a ring buffer, so no one is going to clear the dirty bitmap for us.

Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20201001012228 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Add blessed SVE registers to get-reg-list

Add support for the SVE registers to get-reg-list and create a
new test, get-reg-list-sve, which tests them when running on a
machine with SVE support.

Signed-off-by: Andrew Jones <[email protected]>
Message-Id: <20201029201703 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Add aarch64 get-reg-list test

Check for KVM_GET_REG_LIST regressions. The blessed list was
created by running on v4.15 with the --core-reg-fixup option.
The following script was also used in order to annotate system
registers with their names when possible. When new system
registers are added the names can just be added manually using
the same grep.

while read reg; do
if [[ ! $reg =~ ARM64_SYS_REG ]]; then
printf "\t$reg\n"
continue
fi
encoding=$(echo "$reg" | sed "s/ARM64_SYS_REG(//;s/),//")
if ! name=$(grep "$encoding" ../../../../arch/arm64/include/asm/sysreg.h); then
printf "\t$reg\n"
continue
fi
name=$(echo "$name" | sed "s/.*SYS_//;s/[\t ]*sys_reg($encoding)$//")
printf "\t$reg\t/* $name */\n"
done < <(aarch64/get-reg-list --core-reg-fixup --list)

Signed-off-by: Andrew Jones <[email protected]>
Message-Id: <20201029201703 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

selftests: kvm: test enforcement of paravirtual cpuid features

Add a set of tests that ensure the guest cannot access paravirtual msrs
and hypercalls that have been disabled in the KVM_CPUID_FEATURES leaf.
Expect a #GP in the case of msr accesses and -KVM_ENOSYS from
hypercalls.

Cc: Jim Mattson <[email protected]>
Signed-off-by: Oliver Upton <[email protected]>
Reviewed-by: Peter Shier <[email protected]>
Reviewed-by: Aaron Lewis <[email protected]>
Message-Id: <20201027231044 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

selftests: kvm: Add exception handling to selftests

Add the infrastructure needed to enable exception handling in selftests.
This allows any of the exception and interrupt vectors to be overridden
in the guest.

Signed-off-by: Aaron Lewis <[email protected]>
Reviewed-by: Alexander Graf <[email protected]>
Message-Id: <20201012194716.3950330 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

selftests: kvm: Clear uc so UCALL_NONE is being properly reported

Ensure the out value 'uc' in get_ucall() is properly reporting
UCALL_NONE if the call fails. The return value will be correctly
reported, however, the out parameter 'uc' will not be. Clear the struct
to ensure the correct value is being reported in the out parameter.

Signed-off-by: Aaron Lewis <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Alexander Graf <[email protected]>
Message-Id: <20201012194716.3950330 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

selftests: kvm: Fix the segment descriptor layout to match the actual layout

Fix the layout of 'struct desc64' to match the layout described in the
SDM Vol 3, Chapter 3 "Protected-Mode Memory Management", section 3.4.5
"Segment Descriptors", Figure 3-8 "Segment Descriptor". The test added
later in this series relies on this and crashes if this layout is not
correct.

Signed-off-by: Aaron Lewis <[email protected]>
Reviewed-by: Alexander Graf <[email protected]>
Message-Id: <20201012194716.3950330 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrs

Windows2016 guest tries to enable LBR by setting the corresponding bits
in MSR_IA32_DEBUGCTLMSR. KVM does not emulate MSR_IA32_DEBUGCTLMSR and
spams the host kernel logs with error messages like:

kvm [...]: vcpu1, guest rIP: 0xfffff800a8b687d3 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop"

This patch fixes this by enabling error logging only with
'report_ignored_msrs=1'.

Signed-off-by: Pankaj Gupta <[email protected]>
Message-Id: <20201105153932 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: x86: request masterclock update any time guest uses different msr

Commit 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME)
emulation in helper fn", 2020-10-21) subtly changed the behavior of guest
writes to MSR_KVM_SYSTEM_TIME(_NEW). Restore the previous behavior; update
the masterclock any time the guest uses a different msr than before.

Fixes: 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21)
Signed-off-by: Oliver Upton <[email protected]>
Reviewed-by: Peter Shier <[email protected]>
Message-Id: <20201027231044 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: x86: ensure pv_cpuid.features is initialized when enabling cap

Make the paravirtual cpuid enforcement mechanism idempotent to ioctl()
ordering by updating pv_cpuid.features whenever userspace requests the
capability. Extract this update out of kvm_update_cpuid_runtime() into a
new helper function and move its other call site into
kvm_vcpu_after_set_cpuid() where it more likely belongs.

Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Signed-off-by: Oliver Upton <[email protected]>
Reviewed-by: Peter Shier <[email protected]>
Message-Id: <20201027231044 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: x86: reads of restricted pv msrs should also result in #GP

commit 66570e966dd9 ("kvm: x86: only provide PV features if enabled in
guest's CPUID") only protects against disallowed guest writes to KVM
paravirtual msrs, leaving msr reads unchecked. Fix this by enforcing
KVM_CPUID_FEATURES for msr reads as well.

Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Signed-off-by: Oliver Upton <[email protected]>
Reviewed-by: Peter Shier <[email protected]>
Message-Id: <20201027231044 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: use positive error values for msr emulation that causes #GP

Recent introduction of the userspace msr filtering added code that uses
negative error codes for cases that result in either #GP delivery to
the guest, or handled by the userspace msr filtering.

This breaks an assumption that a negative error code returned from the
msr emulation code is a semi-fatal error which should be returned
to userspace via KVM_RUN ioctl and usually kill the guest.

Fix this by reusing the already existing KVM_MSR_RET_INVALID error code,
and by adding a new KVM_MSR_RET_FILTERED error code for the
userspace filtered msrs.

Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace")
Reported-by: Qian Cai <[email protected]>
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <20201101115523 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Documentation: Update entry for KVM_CAP_ENFORCE_PV_CPUID

Should be squashed into 66570e966dd9cb4f.

Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20201023183358 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Documentation: Update entry for KVM_X86_SET_MSR_FILTER

It should be an accident when rebase, since we've already have section
8.25 (which is KVM_CAP_S390_DIAG318). Fix the number.

Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20201001012044 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86/mmu: fix counting of rmap entries in pte_list_add

Fix an off-by-one style bug in pte_list_add() where it failed to
account the last full set of SPTEs, i.e. when desc->sptes is full
and desc->more is NULL.

Merge the two "PTE_LIST_EXT-1" checks as part of the fix to avoid
an extra comparison.

Signed-off-by: Li RongQing <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <1601196297 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'kvmarm-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for v5.10, take #2

- Fix compilation error when PMD and PUD are folded
- Fix regresssion of the RAZ behaviour of ID_AA64ZFR0_EL1

Merge tag 'block-5.10-2020-11-07' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

- NVMe pull request from Christoph:
    - revert a nvme_queue size optimization (Keith Bush)
    - fabrics timeout races fixes (Chao Leng and Sagi Grimberg)"

- null_blk zone locking fix (Damien)

* tag 'block-5.10-2020-11-07' of git://git.kernel.dk/linux-block:
  null_blk: Fix scheduling in atomic with zoned mode
  nvme-tcp: avoid repeated request completion
  nvme-rdma: avoid repeated request completion
  nvme-tcp: avoid race between time out and tear down
  nvme-rdma: avoid race between time out and tear down
  nvme: introduce nvme_sync_io_queues
  Revert "nvme-pci: remove last_sq_tail"

Merge tag 'io_uring-5.10-2020-11-07' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"A set of fixes for io_uring:

   - SQPOLL cancelation fixes

   - Two fixes for the io_identity COW

   - Cancelation overflow fix (Pavel)

   - Drain request cancelation fix (Pavel)

   - Link timeout race fix (Pavel)"

* tag 'io_uring-5.10-2020-11-07' of git://git.kernel.dk/linux-block:
  io_uring: fix link lookup racing with link timeout
  io_uring: use correct pointer for io_uring_show_cred()
  io_uring: don't forget to task-cancel drained reqs
  io_uring: fix overflowed cancel w/ linked ->files
  io_uring: drop req/tctx io_identity separately
  io_uring: ensure consistent view of original task ->mm from SQPOLL
  io_uring: properly handle SQPOLL request cancelations
  io-wq: cancel request if it's asking for files and we don't have them

futex: Handle transient "ownerless" rtmutex state correctly

Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:

Task Prio       Operation
T1   120 lock(F)
T2   120 lock(F)   -> blocks (top waiter)
T3   50 (RT) lock(F)   -> boosts T1 and blocks (new top waiter)
XX    timeout/  -> wakes T2
signal
T1   50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
T2   120 cleanup   -> try_to_take_mutex() fails because T3 is the top waiter
           and the lower priority T2 cannot steal the lock.
        -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()

The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.

The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().

Drop the locks, so that T3 can make progress, and then try the fixup again.

Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above.

[ tglx: Wrote comment and changelog ]

Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]

net: marvell: prestera: fix compilation with CONFIG_BRIDGE=m

With CONFIG_BRIDGE=m the compilation fails:

ld: drivers/net/ethernet/marvell/prestera/prestera_switchdev.o: in function `prestera_bridge_port_event':
prestera_switchdev.c:(.text+0x2ebd): undefined reference to `br_vlan_enabled'

in case the driver is statically enabled.

Fix it by adding 'BRIDGE || BRIDGE=n' dependency.

Fixes: e1189d9a5fbe ("net: marvell: prestera: Add Switchdev driver implementation")
Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Vadym Kochan <[email protected]>
Acked-by: Randy Dunlap <[email protected]> # build-tested
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'mlx5-fixes-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2020-11-03

v1->v2:
- Fix fixes line tag in patch #1
- Toss ktls refcount leak fix, Maxim will look further into the root
   cause.
- Toss eswitch chain 0 prio patch, until we determine if it is needed
   for -rc and net.

* tag 'mlx5-fixes-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Fix incorrect access of RCU-protected xdp_prog
  net/mlx5e: Fix VXLAN synchronization after function reload
  net/mlx5: E-switch, Avoid extack error log for disabled vport
  net/mlx5: Fix deletion of duplicate rules
  net/mlx5e: Use spin_lock_bh for async_icosq_lock
  net/mlx5e: Protect encap route dev from concurrent release
  net/mlx5e: Fix modify header actions memory leak
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

r8169: disable hw csum for short packets on all chip versions

RTL8125B has same or similar short packet hw padding bug as RTL8168evl.
The main workaround has been extended accordingly, however we have to
disable also hw checksumming for short packets on affected new chip
versions. Instead of checking for an affected chip version let's
simply disable hw checksumming for short packets in general.

v2:
- remove the version checks and disable short packet hw csum in general
- reflect this in commit title and message

Fixes: 0439297be951 ("r8169: add support for RTL8125B")
Signed-off-by: Heiner Kallweit <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

r8169: fix potential skb double free in an error path

The caller of rtl8169_tso_csum_v2() frees the skb if false is returned.
eth_skb_pad() internally frees the skb on error what would result in a
double free. Therefore use __skb_put_padto() directly and instruct it
to not free the skb on error.

Fixes: b423e9ae49d7 ("r8169: fix offloaded tx checksum for small packets.")
Reported-by: Jakub Kicinski <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Driver bugfixes for I2C.

  Most of them are for the new mlxbf driver which got more exposure
  after rc1. The sh_mobile patch should already have reached you during
  the merge window, but I accidently dropped it. However, since it fixes
  a problem with rebooting, it is still fine for rc3"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: designware: slave should do WRITE_REQUESTED before WRITE_RECEIVED
  i2c: designware: call i2c_dw_read_clear_intrbits_slave() once
  i2c: mlxbf: I2C_MLXBF should depend on MELLANOX_PLATFORM
  i2c: mlxbf: Update author and maintainer email info
  i2c: mlxbf: Update reference clock frequency
  i2c: mlxbf: Remove unecessary wrapper functions
  i2c: mlxbf: Fix resrticted cast warning of sparse
  i2c: mlxbf: Add CONFIG_ACPI to guard ACPI function call
  i2c: sh_mobile: implement atomic transfers
  i2c: mediatek: move dma reset before i2c reset

Merge tag 'riscv-for-linus-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

- SPDX comment style fix

- ignore memory that is unusable

- avoid setting a kernel text offset for the !MMU kernels, where
   skipping the first page of memory is both unnecessary and costly

- avoid passing the flag bits in satp to pfn_to_virt()

- fix __put_kernel_nofault, where we had the arguments to
   __put_user_nocheck reversed

- workaround for a bug in the FU540 to avoid triggering PMP issues
   during early boot

- change to how we pull symbols out of the vDSO. The old mechanism was
   removed from binutils-2.35 (and has been backported to Debian's 2.34)

* tag 'riscv-for-linus-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Fix the VDSO symbol generaton for binutils-2.35+
  RISC-V: Use non-PGD mappings for early DTB access
  riscv: uaccess: fix __put_kernel_nofault()
  riscv: fix pfn_to_virt err in do_page_fault().
  riscv: Set text_offset correctly for M-Mode
  RISC-V: Remove any memblock representing unusable memory area
  risc-v: kernel: ftrace: Fixes improper SPDX comment style

Merge tag 'usb-serial-5.10-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB-serial fixes for 5.10-rc3

Here's a fix for a long-standing issue with the cyberjack driver and
some new device ids.

All have been in linux-next with no reported issues.

* tag 'usb-serial-5.10-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
  USB: serial: option: add Telit FN980 composition 0x1055
  USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231
  USB: serial: cyberjack: fix write-URB completion race
  USB: serial: option: add Quectel EC200T module support

perf/core: Fix a memory leak in perf_event_parse_addr_filter()

As shown through runtime testing, the "filename" allocation is not
always freed in perf_event_parse_addr_filter().

There are three possible ways that this could happen:

- It could be allocated twice on subsequent iterations through the loop,
- or leaked on the success path,
- or on the failure path.

Clean up the code flow to make it obvious that 'filename' is always
freed in the reallocation path and in the two return paths as well.

We rely on the fact that kfree(NULL) is NOP and filename is initialized
with NULL.

This fixes the leak. No other side effects expected.

[ Dan Carpenter: cleaned up the code flow & added a changelog. ]
[ Ingo Molnar: updated the changelog some more. ]

Fixes: 375637bc5249 ("perf/core: Introduce address range filtering")
Signed-off-by: "kiyin(尹亮)" <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Anthony Liguori <[email protected]>
--
kernel/events/core.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)

x86/platform/uv: Recognize UV5 hubless system identifier

Testing shows a problem in that UV5 hubless systems were not being
recognized. Add them to the list of OEM IDs checked.

Fixes: 6c7794423a998 ("Add UV5 direct references")
Signed-off-by: Mike Travis <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

x86/platform/uv: Remove spaces from OEM IDs

Testing shows that trailing spaces caused problems with the OEM_ID and
the OEM_TABLE_ID.  One being that the OEM_ID would not string compare
correctly.  Another the OEM_ID and OEM_TABLE_ID would be concatenated
in the printout.  Remove any trailing spaces.

Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab")
Signed-off-by: Mike Travis <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

x86/platform/uv: Fix missing OEM_TABLE_ID

Testing shows a problem in that the OEM_TABLE_ID was missing for
hubless systems. This is used to determine the APIC type (legacy or
extended). Add the OEM_TABLE_ID to the early hubless processing.

Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab")
Signed-off-by: Mike Travis <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

jbd2: fix up sparse warnings in checkpoint code

Add missing __acquires() and __releases() annotations. Also, in an
"this should never happen" WARN_ON check, if it *does* actually
happen, we need to release j_state_lock since this function is always
supposed to release that lock. Otherwise, things will quickly grind
to a halt after the WARN_ON trips.

Fixes: 96f1e0974575 ("jbd2: avoid long hold times of j_state_lock...")
Cc: [email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fix sparse warnings in fast_commit code

Add missing __acquire() and __releases() annotations, and make
fc_ineligible_reasons[] static, as it is not used outside of
fs/ext4/fast_commit.c.

Signed-off-by: Theodore Ts'o <[email protected]>

ext4: cleanup fast commit mount options

Drop no_fc mount option that disable fast commit even if it was
enabled at mkfs time. Move fc_debug_force mount option under ifdef
EXT4_DEBUG to annotate that this is strictly for debugging and testing
purposes and should not be used in production.

Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

jbd2: don't start fast commit on aborted journal

Fast commit should not be started if the journal is aborted.

Signed-off-by: Harshad Shirwadkar<[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: make s_mount_flags modifications atomic

Fast commit file system states are recorded in
sbi->s_mount_flags. Fast commit expects these bit manipulations to be
atomic. This patch adds helpers to make those modifications atomic.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: issue fsdev cache flush before starting fast commit

If the journal dev is different from fsdev, issue a cache flush before
committing fast commit blocks to disk.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: disable fast commit with data journalling

Fast commits don't work with data journalling. This patch disables the
fast commit support when data journalling is turned on.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fix inode dirty check in case of fast commits

In case of fast commits, determine if the inode is dirty by checking
if the inode is on fast commit list. This also helps us get rid of
ext4_inode_info.i_fc_committed_subtid field.

Reported-by: Andrea Righi <[email protected]>
Tested-by: Andrea Righi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: remove unnecessary fast commit calls from ext4_file_mmap

Remove unnecessary calls to ext4_fc_start_update() and
ext4_fc_stop_update() from ext4_file_mmap().

Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: mark buf dirty before submitting fast commit buffer

Mark the fast commit buffer as dirty before submission.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fix code documentatioon

Add a TODO to remember fixing REQ_FUA | REQ_PREFLUSH for fast commit
buffers. Also, fix a typo in top level comment in fast_commit.c

Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: dedpulicate the code to wait on inode that's being committed

This patch removes the deduplicates the code that implements waiting
on inode that's being committed. That code is moved into a new
function.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

jbd2: don't read journal->j_commit_sequence without taking a lock

Take journal state lock before reading journal->j_commit_sequence.

Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

jbd2: don't touch buffer state until it is filled

Fast commit buffers should be filled in before toucing their
state. Remove code that sets buffer state as dirty before the buffer
is passed to the file system.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

jbd2: add todo for a fast commit performance optimization

Fast commit performance can be optimized if commit thread doesn't wait
for ongoing fast commits to complete until the transaction enters
T_FLUSH state. Document this optimization.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

jbd2: don't pass tid to jbd2_fc_end_commit_fallback()

In jbd2_fc_end_commit_fallback(), we know which tid to commit. There's
no need for caller to pass it.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

jbd2: don't use state lock during commit path

Variables journal->j_fc_off, journal->j_fc_wbuf are accessed during
commit path. Since today we allow only one process to perform a fast
commit, there is no need take state lock before accessing these
variables. This patch removes these locks and adds comments to
describe this.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

jbd2: drop jbd2_fc_init documentation

Now that jbd2_fc_init is dropped, drop its docs too.

Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: clean up the JBD2 API that initializes fast commits

This patch removes jbd2_fc_init() API and its related functions to
simplify enabling fast commits. With this change, the number of fast
commit blocks to use is solely determined by the JBD2 layer. So, we
move the default value for minimum number of fast commit blocks from
ext4/fast_commit.h to include/linux/jbd2.h. However, whether or not to
use fast commits is determined by the file system. The file system
just sets the fast commit feature using
jbd2_journal_set_features(). JBD2 layer then determines how many
blocks to use for fast commits (based on the value found in the JBD2
superblock).

Note that the JBD2 feature flag of fast commits is just an indication
that there are fast commit blocks present on disk. It doesn't tell
JBD2 layer about the intent of the file system of whether to it wants
to use fast commit or not. That's why, we blindly clear the fast
commit flag in journal_reset() after the recovery is done.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufs

The on-disk superblock field sb->s_maxlen represents the total size of
the journal including the fast commit area and is no more the max
number of blocks available for a transaction. The maximum number of
blocks available to a transaction is reduced by the number of fast
commit blocks. So, this patch renames j_maxlen to j_total_len to
better represent its intent. Also, it adds a function to calculate max
number of bufs available for a transaction.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fixup ext4_fc_track_* functions' signature

Firstly, pass handle to all ext4_fc_track_* functions and use
transaction id found in handle->h_transaction->h_tid for tracking fast
commit updates. Secondly, don't pass inode to
ext4_fc_track_link/create/unlink functions. inode can be found inside
these functions as d_inode(dentry). However, rename path is an
exeception. That's because in that case, we need inode that's not same
as d_inode(dentry). To handle that, add a couple of low-level wrapper
functions that take inode and dentry as arguments.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: drop redundant calls ext4_fc_track_range

ext4_fc_track_range() should only be called when blocks are added or
removed from an inode. So, the only places from where we need to call
this function are ext4_map_blocks(), punch hole, collapse / zero
range, truncate. Remove all the other redundant calls to ths function.

Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: mark fc ineligible if inode gets evictied due to mem pressure

If inode gets evicted due to memory pressure, we have to remove it
from the fast commit list. However, that inode may have uncommitted
changes that fast commits will lose. So, just fall back to full
commits in this case. Also, rename the fast commit ineligiblity reason
from "EXT4_FC_REASON_MEM" to "EXT4_FC_REASON_MEM_NOMEM" for better
expression.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: describe fast_commit feature flags

Fast commit feature has flags in the file system as well in JBD2. The
meaning of fast commit feature flags can get confusing. Update docs
and code to add more documentation about it.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Harshad Shirwadkar <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: unlock xattr_sem properly in ext4_inline_data_truncate()

It takes xattr_sem to check inline data again but without unlock it
in case not have. So unlock it before return.

Fixes: aef1c8513c1f ("ext4: let ext4_truncate handle inline data correctly")
Reported-by: Dan Carpenter <[email protected]>
Cc: Tao Ma <[email protected]>
Signed-off-by: Joseph Qi <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]

ext4: silence an uninitialized variable warning

Smatch complains that "i" can be uninitialized if we don't enter the
loop.  I don't know if it's possible but we may as well silence this
warning.

[ Initialize i to sb->s_blocksize instead of 0.  The only way the for
  loop could be skipped entirely is the in-memory data structures, in
  particular the bh->b_data for the on-disk superblock has gotten
  corrupted enough that calculated value of group is >= to
  ext4_get_groups_count(sb).  In that case, we want to exit
  immediately without allocating a block.  -- TYT ]

Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Signed-off-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/r/20201030114620.GB3251003@mwanda
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]

MAINTAINERS: add missing file in ext4 entry

include/trace/events/ext4.h belongs to ext4 module, add the file path into
ext4 entry in MAINTAINERS.

Signed-off-by: Chao Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA

The macro MOPT_Q is used to indicates the mount option is related to
quota stuff and is defined to be MOPT_NOSUPPORT when CONFIG_QUOTA is
disabled.  Normally the quota options are handled explicitly, so it
didn't matter that the MOPT_STRING flag was missing, even though the
usrjquota and grpjquota mount options take a string argument.  It's
important that's present in the !CONFIG_QUOTA case, since without
MOPT_STRING, the mount option matcher will match usrjquota= followed
by an integer, and will otherwise skip the table entry, and so "mount
option not supported" error message is never reported.

[ Fixed up the commit description to better explain why the fix
  works. --TYT ]

Fixes: 26092bf52478 ("ext4: use a table-driven handler for mount options")
Signed-off-by: Kaixu Xia <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2020-11-06

1) Pre-allocated per-cpu hashmap needs to zero-fill reused element, from David.

2) Tighten bpf_lsm function check, from KP.

3) Fix bpftool attaching to flow dissector, from Lorenz.

4) Use -fno-gcse for the whole kernel/bpf/core.c instead of function attribute, from Ard.

* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Update verification logic for LSM programs
  bpf: Zero-fill re-used per-cpu map element
  bpf: BPF_PRELOAD depends on BPF_SYSCALL
  tools/bpftool: Fix attaching flow dissector
  libbpf: Fix possible use after free in xsk_socket__delete
  libbpf: Fix null dereference in xsk_socket__delete
  libbpf, hashmap: Fix undefined behavior in hash_bits
  bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
  tools, bpftool: Remove two unused variables.
  tools, bpftool: Avoid array index warnings.
  xsk: Fix possible memory leak at socket close
  bpf: Add struct bpf_redir_neigh forward declaration to BPF helper defs
  samples/bpf: Set rlimit for memlock to infinity in all samples
  bpf: Fix -Wshadow warnings
  selftest/bpf: Fix profiler test using CO-RE relocation for enums
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>