Git Repo - linux.git/log

help_next should increase position index

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <[email protected]>
Signed-off-by: Mike Marshall <[email protected]>

Merge branch 'nvme-5.6' of git://git.infradead.org/nvme into block-5.6

Pull NVMe fixes from Keith.

* 'nvme-5.6' of git://git.infradead.org/nvme:
  nvmet: update AEN list and array at one place
  nvmet: Fix controller use after free
  nvmet: Fix error print message at nvmet_install_queue function
  nvme-pci: remove nvmeq->tags
  nvmet: fix dsm failure when payload does not match sgl descriptor
  nvmet: Pass lockdep expression to RCU lists

NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals

Currently, each time nfs4_do_fsinfo() is called it will do an implicit
NFS4 lease renewal, which is not compliant with the NFS4 specification.
This can result in a lease being expired by an NFS server.

Commit 83ca7f5ab31f ("NFS: Avoid PUTROOTFH when managing leases")
introduced implicit client lease renewal in nfs4_do_fsinfo(),
which can result in the NFSv4.0 lease to expire on a server side,
and servers returning NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID.

This can easily be reproduced by frequently unmounting a sub-mount,
then stat'ing it to get it mounted again, which will delay or even
completely prevent client from sending RENEW operations if no other
NFS operations are issued. Eventually nfs server will expire client's
lease and return an error on file access or next RENEW.

This can also happen when a sub-mount is automatically unmounted
due to inactivity (after nfs_mountpoint_expiry_timeout), then it is
mounted again via stat(). This can result in a short window during
which client's lease will expire on a server but not on a client.
This specific case was observed on production systems.

This patch removes the implicit lease renewal from nfs4_do_fsinfo().

Fixes: 83ca7f5ab31f ("NFS: Avoid PUTROOTFH when managing leases")
Signed-off-by: Robert Milkowski <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

NFSv4: try lease recovery on NFS4ERR_EXPIRED

Currently, if an nfs server returns NFS4ERR_EXPIRED to open(),
we return EIO to applications without even trying to recover.

Fixes: 272289a3df72 ("NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid")
Signed-off-by: Robert Milkowski <[email protected]>
Reviewed-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

nvmet: update AEN list and array at one place

All async events are enqueued via nvmet_add_async_event() which
updates the ctrl->async_event_cmds[] array and additionally an struct
nvmet_async_event is added to the ctrl->async_events list.

Under normal operations the nvmet_async_event_work() updates again
the ctrl->async_event_cmds and removes the corresponding struct
nvmet_async_event from the list again. Though nvmet_sq_destroy() could
be called which calls nvmet_async_events_free() which only updates the
ctrl->async_event_cmds[] array.

Add new functions nvmet_async_events_process() and
nvmet_async_events_free() to process async events, update an array and
the list.

When we destroy submission queue after clearing the aen present on
the ctrl->async list we also loop over ctrl->async_event_cmds[] for
any requests posted by the host for which we don't have the AEN in
the ctrl->async_events list by calling nvmet_async_event_process()
and nvmet_async_events_free().

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
[[email protected]
* Loop over and clear out outstanding requests
* Update changelog
]
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

ALSA: hda - Fix DP-MST support for NVIDIA codecs

If dyn_pcm_assign is set, different jack objects are being created
for pcm and pins.

If dyn_pcm_assign is set, generic_hdmi_build_jack() calls into
add_hdmi_jack_kctl() to create and track separate jack object for
pcm. Like sync_eld_via_acomp(), hdmi_present_sense_via_verbs() also
need to report status change of the pcm jack.

Rename pin_idx_to_jack() to pin_idx_to_pcm_jack(). Update
hdmi_present_sense_via_verbs() to report plug state of pcm jack
object. Unlike sync_eld_via_acomp(), for !acomp drivers the pcm
jack's plug state must be consistent with plug state
of pin's jack.

Fixes: 5398e94fb753 ("ALSA: hda - Add DP-MST support for NVIDIA codecs")
Reported-and-tested-by: Martin Regner <[email protected]>
Signed-off-by: Nikhil Mahale <[email protected]>
Reviewed-by: Kai Vehmanen <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

nvmet: Fix controller use after free

After nvmet_install_queue() sets sq->ctrl calling to nvmet_sq_destroy()
reduces the controller refcount. In case nvmet_install_queue() fails,
calling to nvmet_ctrl_put() is done twice (at nvmet_sq_destroy and
nvmet_execute_io_connect/nvmet_execute_admin_connect) instead of once for
the queue which leads to use after free of the controller. Fix this by set
NULL at sq->ctrl in case of a failure at nvmet_install_queue().

The bug leads to the following Call Trace:

[65857.994862] refcount_t: underflow; use-after-free.
[65858.108304] Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma]
[65858.115557] RIP: 0010:refcount_warn_saturate+0xe5/0xf0
[65858.208141] Call Trace:
[65858.211203]  nvmet_sq_destroy+0xe1/0xf0 [nvmet]
[65858.216383]  nvmet_rdma_release_queue_work+0x37/0xf0 [nvmet_rdma]
[65858.223117]  process_one_work+0x167/0x370
[65858.227776]  worker_thread+0x49/0x3e0
[65858.232089]  kthread+0xf5/0x130
[65858.235895]  ? max_active_store+0x80/0x80
[65858.240504]  ? kthread_bind+0x10/0x10
[65858.244832]  ret_from_fork+0x1f/0x30
[65858.249074] ---[ end trace f82d59250b54beb7 ]---

Fixes: bb1cc74790eb ("nvmet: implement valid sqhd values in completions")
Fixes: 1672ddb8d691 ("nvmet: Add install_queue callout")
Signed-off-by: Israel Rukshin <[email protected]>
Reviewed-by: Max Gurtovoy <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

nvmet: Fix error print message at nvmet_install_queue function

Place the arguments in the correct order.

Fixes: 1672ddb8d691 ("nvmet: Add install_queue callout")
Signed-off-by: Israel Rukshin <[email protected]>
Reviewed-by: Max Gurtovoy <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

NFS: Fix memory leaks

In _nfs42_proc_copy(), 'res->commit_res.verf' is allocated through
kzalloc() if 'args->sync' is true. In the following code, if
'res->synchronous' is false, handle_async_copy() will be invoked. If an
error occurs during the invocation, the following code will not be executed
and the error will be returned . However, the allocated
'res->commit_res.verf' is not deallocated, leading to a memory leak. This
is also true if the invocation of process_copy_commit() returns an error.

To fix the above leaks, redirect the execution to the 'out' label if an
error is encountered.

Signed-off-by: Wenwen Wang <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

nfs: optimise readdir cache page invalidation

When the directory is large and it's being modified by one client
while another client is doing the 'ls -l' on the same directory then
the cache page invalidation from nfs_force_use_readdirplus causes
the reading client to keep restarting READDIRPLUS from cookie 0
which causes the 'ls -l' to take a very long time to complete,
possibly never completing.

Currently when nfs_force_use_readdirplus is called to switch from
READDIR to READDIRPLUS, it invalidates all the cached pages of the
directory. This cache page invalidation causes the next nfs_readdir
to re-read the directory content from cookie 0.

This patch is to optimise the cache invalidation in
nfs_force_use_readdirplus by only truncating the cached pages from
last page index accessed to the end the file. It also marks the
inode to delay invalidating all the cached page of the directory
until the next initial nfs_readdir of the next 'ls' instance.

Signed-off-by: Dai Ngo <[email protected]>
Reviewed-by: Trond Myklebust <[email protected]>
[Anna - Fix conflicts with Trond's readdir patches]
[Anna - Remove redundant call to nfs_zap_mapping()]
[Anna - Replace d_inode(file_dentry(desc->file)) with file_inode(desc->file)]
Signed-off-by: Anna Schumaker <[email protected]>

drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_voltage

Cull out 0 clocks to avoid a warning in DC.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/963
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_latency

Only send non-0 clocks to DC for validation. This mirrors
what the windows driver does.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/963
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/display: handle multiple numbers of fclks in dcn_calcs.c (v2)

We might get different numbers of clocks from powerplay depending
on what the OEM has populated.

v2: add assert for at least one level

Bug: https://gitlab.freedesktop.org/drm/amd/issues/963
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fetch default VDDC curve voltages (v2)

Ask the SMU for the default VDDC curve voltage values. This
properly reports the VDDC values in the OD interface.

v2: only update if the original values are 0

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 5.5.x

drm/amdgpu/smu_v11_0: Correct behavior of restoring default tables (v2)

Previously, the syfs functionality for restoring the default powerplay
table was sourcing it's information from the currently-staged powerplay
table.

This patch adds a step to cache the first overdrive table that we see on
boot, so that it can be used later to "restore" the powerplay table

v2: sqaush my original with Matt's fix

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020
Signed-off-by: Matt Coffin <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 5.5.x

drm/amdgpu/navi10: add OD_RANGE for navi overclocking

So users can see the range of valid values.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 5.5.x

drm/amdgpu/navi: fix index for OD MCLK

You can only adjust the max mclk, not the min.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 5.5.x

drm/amd/display: Fix HW/SW state mismatch

[Why]
When we disable a connector we don't explicitly remove it from the module so the
display is still cached(SW) in the hdcp_module.

SST: no issues because we can only have 1 display per link

MST: We have x displays per link, now if we disable 1 we don't remove it from the
module so the module has x display cached(SW).

If we try to enable HDCP, psp verification will fail because we are reporting x
displays while the HW only has x-1 display enabled

[How]
Check the callback for when we disable stream and call remove display.

Signed-off-by: Bhawanpreet Lakha <[email protected]>
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix a typo when computing dsc configuration

[why]
Remove a backslash symbol accidentally left in increase bpp function
when computing mst dsc configuration.

Signed-off-by: Mikita Lipski <[email protected]>
Reviewed-by: Zhan Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/powerplay: fix navi10 system intermittent reboot issue V2

This workaround is needed only for Navi10 12 Gbps SKUs.

V2: added SMU firmware version guard

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode

The sdma_queue_count increment should be done before
execute_queues_cpsch(), which calls pm_calc_rlib_size() where
sdma_queue_count is used to calculate whether over_subscription is
triggered.

With the previous code, when a SDMA queue is created,
compute_queue_count in pm_calc_rlib_size() is one more than the
actual compute queue number, because the queue_count has been
incremented while sdma_queue_count has not. This patch fixes that.

Signed-off-by: Yong Zhao <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Only enable cursor on pipes that need it

[Why]
In current code we're essentially drawing the cursor on every pipe
that contains it. This only works when the planes have the same
scaling for src to dest rect, otherwise we'll get "double cursor" where
one cursor is incorrectly filtered and offset from the real position.

[How]
Without dedicated cursor planes on DCN we require at least one pipe
that matches the scaling of the current timing.

This is an optimization and workaround for the most common case where
the top-most plane is not scaled but the bottom-most plane is scaled.

Whenever a pipe has a parent pipe in the blending tree whose recout
fully contains the current pipe we can disable the pipe.

This only applies when the pipe is actually visible of course.

Signed-off-by: Nicholas Kazlauskas <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

x86/PCI: Define to_pci_sysdata() even when !CONFIG_PCI

Recently, the to_pci_sysdata() helper was added inside the CONFIG_PCI
guard, but it is used inside a CONFIG_NUMA guard, which does not require
CONFIG_PCI. This breaks builds on !CONFIG_PCI machines. Make
to_pci_sysdata() available in all configurations.

Fixes: aad6aa0cd674 ("x86/PCI: Add to_pci_sysdata() helper")
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Randy Dunlap <[email protected]> # build-tested

brd: check and limit max_part par

In brd_init func, rd_nr num of brd_device are firstly allocated
and add in brd_devices, then brd_devices are traversed to add each
brd_device by calling add_disk func. When allocating brd_device,
the disk->first_minor is set to i * max_part, if rd_nr * max_part
is larger than MINORMASK, two different brd_device may have the same
devt, then only one of them can be successfully added.
when rmmod brd.ko, it will cause oops when calling brd_exit.

Follow those steps:
  # modprobe brd rd_nr=3 rd_size=102400 max_part=1048576
  # rmmod brd
then, the oops will appear.

Oops log:
[  726.613722] Call trace:
[  726.614175]  kernfs_find_ns+0x24/0x130
[  726.614852]  kernfs_find_and_get_ns+0x44/0x68
[  726.615749]  sysfs_remove_group+0x38/0xb0
[  726.616520]  blk_trace_remove_sysfs+0x1c/0x28
[  726.617320]  blk_unregister_queue+0x98/0x100
[  726.618105]  del_gendisk+0x144/0x2b8
[  726.618759]  brd_exit+0x68/0x560 [brd]
[  726.619501]  __arm64_sys_delete_module+0x19c/0x2a0
[  726.620384]  el0_svc_common+0x78/0x130
[  726.621057]  el0_svc_handler+0x38/0x78
[  726.621738]  el0_svc+0x8/0xc
[  726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260)

Here, we add brd_check_and_reset_par func to check and limit max_part par.

--
V5->V6:
- remove useless code

V4->V5:(suggested by Ming Lei)
- make sure max_part is not larger than DISK_MAX_PARTS

V3->V4:(suggested by Ming Lei)
- remove useless change
- add one limit of max_part

V2->V3: (suggested by Ming Lei)
- clear .minors when running out of consecutive minor space in brd_alloc
- remove limit of rd_nr

V1->V2:
- add more checks in brd_check_par_valid as suggested by Ming Lei.

Signed-off-by: Zhiqiang Liu <[email protected]>
Reviewed-by: Bob Liu <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

1) Use after free in rxrpc_put_local(), from David Howells.

2) Fix 64-bit division error in mlxsw, from Nathan Chancellor.

3) Make sure we clear various bits of TCP state in response to
    tcp_disconnect(). From Eric Dumazet.

4) Fix netlink attribute policy in cls_rsvp, from Eric Dumazet.

5) txtimer must be deleted in stmmac suspend(), from Nicolin Chen.

6) Fix TC queue mapping in bnxt_en driver, from Michael Chan.

7) Various netdevsim fixes from Taehee Yoo (use of uninitialized data,
    snapshot panics, stack out of bounds, etc.)

8) cls_tcindex changes hash table size after allocating the table, fix
    from Cong Wang.

9) Fix regression in the enforcement of session ID uniqueness in l2tp.
    We only have to enforce uniqueness for IP based tunnels not UDP
    ones. From Ridge Kennedy.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
  gtp: use __GFP_NOWARN to avoid memalloc warning
  l2tp: Allow duplicate session creation with UDP
  r8152: Add MAC passthrough support to new device
  net_sched: fix an OOB access in cls_tcindex
  qed: Remove set but not used variable 'p_link'
  tc-testing: add missing 'nsPlugin' to basic.json
  tc-testing: fix eBPF tests failure on linux fresh clones
  net: hsr: fix possible NULL deref in hsr_handle_frame()
  netdevsim: remove unused sdev code
  netdevsim: use __GFP_NOWARN to avoid memalloc warning
  netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs
  netdevsim: fix stack-out-of-bounds in nsim_dev_debugfs_init()
  netdevsim: fix panic in nsim_dev_take_snapshot_write()
  netdevsim: disable devlink reload when resources are being used
  netdevsim: fix using uninitialized resources
  bnxt_en: Fix TC queue mapping.
  bnxt_en: Fix logic that disables Bus Master during firmware reset.
  bnxt_en: Fix RDMA driver failure with SRIOV after firmware reset.
  bnxt_en: Refactor logic to re-enable SRIOV after firmware reset detected.
  net: stmmac: Delete txtimer in suspend()
  ...

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM updates from Russell King:

- decompressor updates

- prevention of out-of-bounds access while stacktracing

- fix a section mismatch warning with free_memmap()

- make kexec depend on MMU to avoid some build errors

- remove swapops stubs

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8954/1: NOMMU: remove stubs for swapops
  ARM: 8952/1: Disable kmemleak on XIP kernels
  ARM: 8951/1: Fix Kexec compilation issue.
  ARM: 8949/1: mm: mark free_memmap as __init
  ARM: 8948/1: Prevent OOB access in stacktrace
  ARM: 8945/1: decompressor: use CONFIG option instead of cc-option
  ARM: 8942/1: Revert "8857/1: efi: enable CP15 DMB instructions before cleaning the cache"
  ARM: 8941/1: decompressor: enable CP15 barrier instructions in v7 cache setup code

Merge tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
"A pretty small batch for us, and apologies for it being a bit late, I
  wanted to sneak Christophe's user_access_begin() series in.

  Summary:

   - Implement user_access_begin() and friends for our platforms that
     support controlling kernel access to userspace.

   - Enable CONFIG_VMAP_STACK on 32-bit Book3S and 8xx.

   - Some tweaks to our pseries IOMMU code to allow SVMs ("secure"
     virtual machines) to use the IOMMU.

   - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 32-bit
     VDSO, and some other improvements.

   - A series to use the PCI hotplug framework to control opencapi
     card's so that they can be reset and re-read after flashing a new
     FPGA image.

  As well as other minor fixes and improvements as usual.

  Thanks to: Alastair D'Silva, Alexandre Ghiti, Alexey Kardashevskiy,
  Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Bai Yingjie, Chen
  Zhou, Christophe Leroy, Frederic Barrat, Greg Kurz, Jason A.
  Donenfeld, Joel Stanley, Jordan Niethe, Julia Lawall, Krzysztof
  Kozlowski, Laurent Dufour, Laurentiu Tudor, Linus Walleij, Michael
  Bringmann, Nathan Chancellor, Nicholas Piggin, Nick Desaulniers,
  Oliver O'Halloran, Peter Ujfalusi, Pingfan Liu, Ram Pai, Randy Dunlap,
  Russell Currey, Sam Bobroff, Sebastian Andrzej Siewior, Shawn
  Anastasio, Stephen Rothwell, Steve Best, Sukadev Bhattiprolu, Thiago
  Jung Bauermann, Tyrel Datwyler, Vaibhav Jain"

* tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (131 commits)
  powerpc: configs: Cleanup old Kconfig options
  powerpc/configs/skiroot: Enable some more hardening options
  powerpc/configs/skiroot: Disable xmon default & enable reboot on panic
  powerpc/configs/skiroot: Enable security features
  powerpc/configs/skiroot: Update for symbol movement only
  powerpc/configs/skiroot: Drop default n CONFIG_CRYPTO_ECHAINIV
  powerpc/configs/skiroot: Drop HID_LOGITECH
  powerpc/configs: Drop NET_VENDOR_HP which moved to staging
  powerpc/configs: NET_CADENCE became NET_VENDOR_CADENCE
  powerpc/configs: Drop CONFIG_QLGE which moved to staging
  powerpc: Do not consider weak unresolved symbol relocations as bad
  powerpc/32s: Fix kasan_early_hash_table() for CONFIG_VMAP_STACK
  powerpc: indent to improve Kconfig readability
  powerpc: Provide initial documentation for PAPR hcalls
  powerpc: Implement user_access_save() and user_access_restore()
  powerpc: Implement user_access_begin and friends
  powerpc/32s: Prepare prevent_user_access() for user_access_end()
  powerpc/32s: Drop NULL addr verification
  powerpc/kuap: Fix set direction in allow/prevent_user_access()
  powerpc/32s: Fix bad_kuap_fault()
  ...

Merge tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze

Pull Microblaze update from Michal Simek:

- enable CMA

- add support for MB v11

- defconfig updates

- minor fixes

* tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Add ID for Microblaze v11
  microblaze: Prevent the overflow of the start
  microblaze: Wire CMA allocator
  asm-generic: Make dma-contiguous.h a mandatory include/asm header
  microblaze: Sync defconfig with latest Kconfig layout
  microblaze: defconfig: Disable EXT2 driver and Enable EXT3 & EXT4 drivers
  microblaze: Align comments with register usage

Merge tag 'ovl-update-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs update from Miklos Szeredi:

- Try to preserve holes in sparse files when copying up, thus saving
   disk space and improving performance.

- Fix a performance regression introduced in v4.19 by preserving
   asynchronicity of IO when fowarding to underlying layers. Add VFS
   helpers to submit async iocbs.

- Fix a regression in lseek(2) introduced in v4.19 that breaks >2G
   seeks on 32bit kernels.

- Fix a corner case where st_ino/st_dev was not preserved across copy
   up.

- Miscellaneous fixes and cleanups.

* tag 'ovl-update-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fix lseek overflow on 32bit
  ovl: add splice file read write helper
  ovl: implement async IO routines
  vfs: add vfs_iocb_iter_[read|write] helper functions
  ovl: layer is const
  ovl: fix corner case of non-constant st_dev;st_ino
  ovl: fix corner case of conflicting lower layer uuid
  ovl: generalize the lower_fs[] array
  ovl: simplify ovl_same_sb() helper
  ovl: generalize the lower_layers[] array
  ovl: improving copy-up efficiency for big sparse file
  ovl: use ovl_inode_lock in ovl_llseek()
  ovl: use pr_fmt auto generate prefix
  ovl: fix wrong WARN_ON() in ovl_cache_update_ino()

gtp: use __GFP_NOWARN to avoid memalloc warning

gtp hashtable size is received by user-space.
So, this hashtable size could be too large. If so, kmalloc will internally
print a warning message.
This warning message is actually not necessary for the gtp module.
So, this patch adds __GFP_NOWARN to avoid this message.

Splat looks like:
[ 2171.200049][ T1860] WARNING: CPU: 1 PID: 1860 at mm/page_alloc.c:4713 __alloc_pages_nodemask+0x2f3/0x740
[ 2171.238885][ T1860] Modules linked in: gtp veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv]
[ 2171.262680][ T1860] CPU: 1 PID: 1860 Comm: gtp-link Not tainted 5.5.0+ #321
[ 2171.263567][ T1860] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 2171.264681][ T1860] RIP: 0010:__alloc_pages_nodemask+0x2f3/0x740
[ 2171.265332][ T1860] Code: 64 fe ff ff 65 48 8b 04 25 c0 0f 02 00 48 05 f0 12 00 00 41 be 01 00 00 00 49 89 47 0
[ 2171.267301][ T1860] RSP: 0018:ffff8880b51af1f0 EFLAGS: 00010246
[ 2171.268320][ T1860] RAX: ffffed1016a35e43 RBX: 0000000000000000 RCX: 0000000000000000
[ 2171.269517][ T1860] RDX: 0000000000000000 RSI: 000000000000000b RDI: 0000000000000000
[ 2171.270305][ T1860] RBP: 0000000000040cc0 R08: ffffed1018893109 R09: dffffc0000000000
[ 2171.275973][ T1860] R10: 0000000000000001 R11: ffffed1018893108 R12: 1ffff11016a35e43
[ 2171.291039][ T1860] R13: 000000000000000b R14: 000000000000000b R15: 00000000000f4240
[ 2171.292328][ T1860] FS:  00007f53cbc83740(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000
[ 2171.293409][ T1860] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2171.294586][ T1860] CR2: 000055f540014508 CR3: 00000000b49f2004 CR4: 00000000000606e0
[ 2171.295424][ T1860] Call Trace:
[ 2171.295756][ T1860]  ? mark_held_locks+0xa5/0xe0
[ 2171.296659][ T1860]  ? __alloc_pages_slowpath+0x21b0/0x21b0
[ 2171.298283][ T1860]  ? gtp_encap_enable_socket+0x13e/0x400 [gtp]
[ 2171.298962][ T1860]  ? alloc_pages_current+0xc1/0x1a0
[ 2171.299475][ T1860]  kmalloc_order+0x22/0x80
[ 2171.299936][ T1860]  kmalloc_order_trace+0x1d/0x140
[ 2171.300437][ T1860]  __kmalloc+0x302/0x3a0
[ 2171.300896][ T1860]  gtp_newlink+0x293/0xba0 [gtp]
[ ... ]

Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

l2tp: Allow duplicate session creation with UDP

In the past it was possible to create multiple L2TPv3 sessions with the
same session id as long as the sessions belonged to different tunnels.
The resulting sessions had issues when used with IP encapsulated tunnels,
but worked fine with UDP encapsulated ones. Some applications began to
rely on this behaviour to avoid having to negotiate unique session ids.

Some time ago a change was made to require session ids to be unique across
all tunnels, breaking the applications making use of this "feature".

This change relaxes the duplicate session id check to allow duplicates
if both of the colliding sessions belong to UDP encapsulated tunnels.

Fixes: dbdbc73b4478 ("l2tp: fix duplicate session creation")
Signed-off-by: Ridge Kennedy <[email protected]>
Acked-by: James Chapman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ASoC: wcd934x: Add missing COMMON_CLK dependency

Looks like some platforms are not yet using COMMON CLK.

PowerPC allyesconfig failed with below error in next

ld: sound/soc/codecs/wcd934x.o:(.toc+0x0):
undefined reference to `of_clk_src_simple_get'
ld: sound/soc/codecs/wcd934x.o: in function `.wcd934x_codec_probe':
wcd934x.c:(.text.wcd934x_codec_probe+0x3d4):
undefined reference to `.__clk_get_name'
ld: wcd934x.c:(.text.wcd934x_codec_probe+0x438):
undefined reference to `.clk_hw_register'
ld: wcd934x.c:(.text.wcd934x_codec_probe+0x474):
undefined reference to `.of_clk_add_provider'

Add the missing COMMON_CLK dependency to fix this errors.

Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

r8152: Add MAC passthrough support to new device

Device 0xa387 also supports MAC passthrough, therefore add it to the
whitelst.

BugLink: https://bugs.launchpad.net/bugs/1827961/comments/30
Signed-off-by: Kai-Heng Feng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net_sched: fix an OOB access in cls_tcindex

As Eric noticed, tcindex_alloc_perfect_hash() uses cp->hash
to compute the size of memory allocation, but cp->hash is
set again after the allocation, this caused an out-of-bound
access.

So we have to move all cp->hash initialization and computation
before the memory allocation. Move cp->mask and cp->shift together
as cp->hash may need them for computation too.

Reported-and-tested-by: [email protected]
Fixes: 331b72922c5f ("net: sched: RCU cls_tcindex")
Cc: Eric Dumazet <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: Jiri Pirko <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

microblaze: Add ID for Microblaze v11

List Microblaze v11 from PVR.

Signed-off-by: Michal Simek <[email protected]>

microblaze: Prevent the overflow of the start

In case the start + cache size is more than the max int the
start overflows.
Prevent the same.

Signed-off-by: Shubhrajyoti Datta <[email protected]>
Signed-off-by: Michal Simek <[email protected]>

microblaze: Wire CMA allocator

Based on commit 04e3543e228f ("microblaze: use the generic dma coherent
remap allocator")
CMA can be easily enabled by calling dma_contiguous_reserve() at the end of
mmu_init(). High limit is end of lowmem space which is completely unused at
this point of time.

Signed-off-by: Michal Simek <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

asm-generic: Make dma-contiguous.h a mandatory include/asm header

dma-continuguous.h is generic for all architectures except arm32 which has
its own version.

Similar change was done for msi.h by commit a1b39bae16a6
("asm-generic: Make msi.h a mandatory include/asm header")

Suggested-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/T/#m92bb56b04161057635d4142e1b3b9b6b0a70122e
Signed-off-by: Michal Simek <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Acked-by: Paul Walmsley <[email protected]> # for arch/riscv

microblaze: Sync defconfig with latest Kconfig layout

Layout was changed by commit 6210b6402f58
("kernel-hacking: group sysrq/kgdb/ubsan into 'Generic Kernel Debugging Instruments'")

Signed-off-by: Michal Simek <[email protected]>

microblaze: defconfig: Disable EXT2 driver and Enable EXT3 & EXT4 drivers

As EXT4 filesystem driver is used for handling EXT2 file systems as
well. There is no need to enable EXT2 driver. This patch disables EXT2
and enables EXT3/EXT4 drivers.

Signed-off-by: Manish Narani <[email protected]>
Signed-off-by: Michal Simek <[email protected]>

MAINTAINERS: Remove the Bard Liao from the MAINTAINERS of Realtek CODECs

Remove the maintainer "Bard Liao" since he had quitted from Realtek.

Signed-off-by: Oder Chiou <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

Merge tag 'rproc-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc

Pull remoteproc updates from Bjorn Andersson:
"This adds support for the Mediatek MT8183 SCP, modem remoteproc on
  Qualcomm SC7180 platform, audio and sensor remoteprocs on Qualcomm
  MSM8998 and audio, compute, modem and sensor remoteprocs on Qualcomm
  SM8150.

  It adds votes for necessary power-domains for all Qualcomm TrustZone
  based remoteproc instances are held, fixes a bug related to remoteproc
  drivers registering before the core has been initialized and does
  clean up the Qualcomm modem remoteproc driver"

* tag 'rproc-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc: (21 commits)
  remoteproc: qcom: q6v5-mss: Improve readability of reset_assert
  remoteproc: qcom: q6v5-mss: Use regmap_read_poll_timeout
  remoteproc: qcom: q6v5-mss: Rename boot status timeout
  remoteproc: qcom: q6v5-mss: Improve readability across clk handling
  remoteproc: use struct_size() helper
  remoteproc: Initialize rproc_class before use
  rpmsg: add rpmsg support for mt8183 SCP.
  remoteproc/mediatek: add SCP support for mt8183
  dt-bindings: Add a binding for Mediatek SCP
  remoteproc: mss: q6v5-mss: Add modem support on SC7180
  dt-bindings: remoteproc: qcom: Add Q6V5 Modem PIL binding for SC7180
  remoteproc: qcom: pas: Add MSM8998 ADSP and SLPI support
  dt-bindings: remoteproc: qcom: Add ADSP and SLPI support for MSM8998 SoC
  remoteproc: q6v5-mss: Remove mem clk from the active pool
  remoteproc: qcom: Remove unneeded semicolon
  remoteproc: qcom: pas: Add auto_boot flag
  remoteproc: qcom: pas: Add SM8150 ADSP, CDSP, Modem and SLPI support
  dt-bindings: remoteproc: qcom: SM8150 Add ADSP, CDSP, MPSS and SLPI support
  remoteproc: qcom: pas: Vote for active/proxy power domains
  dt-bindings: remoteproc: qcom: Add power-domain bindings for Q6V5 PAS
  ...

Merge tag 'hwlock-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc

Pull hwspinlock updates from Bjorn Andersson:
"This continues the transition of drivers to device managed resources
  and removal of unnecessary PM runtime integration, with cleanups to
  the SIRF, OMAP and Qualcomm hwspinlock drivers.

  It also adds Baolin as reviewer in MAINTAINERS"

* tag 'hwlock-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
  hwspinlock: sirf: Use devm_hwspin_lock_register() to register hwlock controller
  hwspinlock: sirf: Remove redundant PM runtime functions
  hwspinlock: sirf: Change to use devm_platform_ioremap_resource()
  hwspinlock: omap: Use devm_kzalloc() to allocate memory
  hwspinlock: omap: Change to use devm_platform_ioremap_resource()
  hwspinlock: qcom: Use devm_hwspin_lock_register() to register hwlock controller
  hwspinlock: qcom: Remove redundant PM runtime functions
  hwspinlock: stm32: convert to devm_platform_ioremap_resource
  MAINTAINERS: Add myself as reviewer for the hwspinlock subsystem

qed: Remove set but not used variable 'p_link'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/qlogic/qed/qed_cxt.c: In function 'qed_qm_init_pf':
drivers/net/ethernet/qlogic/qed/qed_cxt.c:1401:29: warning:
variable 'p_link' set but not used [-Wunused-but-set-variable]

commit 92fae6fb231f ("qed: FW 8.42.2.0 Queue Manager changes")
leave behind this unused variable.

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'unbreak-basic-and-bpf-tdc-testcases'

Davide Caratti says:

====================
unbreak 'basic' and 'bpf' tdc testcases

- patch 1/2 fixes tdc failures with 'bpf' action on fresch clones of the
kernel tree
- patch 2/2 allow running tdc for the 'basic' classifier without tweaking
tdc_config.py
====================

Signed-off-by: David S. Miller <[email protected]>

tc-testing: add missing 'nsPlugin' to basic.json

since tdc tests for cls_basic need $DEV1, use 'nsPlugin' so that the
following command can be run without errors:

[root@f31 tc-testing]# ./tdc.py -c basic

Fixes: 4717b05328ba ("tc-testing: Introduced tdc tests for basic filter")
Signed-off-by: Davide Caratti <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tc-testing: fix eBPF tests failure on linux fresh clones

when the following command is done on a fresh clone of the kernel tree,

[root@f31 tc-testing]# ./tdc.py -c bpf

test cases that need to build the eBPF sample program fail systematically,
because 'buildebpfPlugin' is unable to install the kernel headers (i.e, the
'khdr' target fails). Pass the correct environment to 'make', in place of
ENVIR, to allow running these tests.

Fixes: 4c2d39bd40c1 ("tc-testing: use a plugin to build eBPF program")
Signed-off-by: Davide Caratti <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hsr: fix possible NULL deref in hsr_handle_frame()

hsr_port_get_rcu() can return NULL, so we need to be careful.

general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
CPU: 1 PID: 10249 Comm: syz-executor.5 Not tainted 5.5.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline]
RIP: 0010:hsr_addr_is_self+0x86/0x330 net/hsr/hsr_framereg.c:44
Code: 04 00 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 6b ff 94 f9 4c 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 02 00 00 48 8b 43 30 49 39 c6 49 89 47 c0 0f
RSP: 0018:ffffc90000da8a90 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff87e0cc33
RDX: 0000000000000006 RSI: ffffffff87e035d5 RDI: 0000000000000000
RBP: ffffc90000da8b20 R08: ffff88808e7de040 R09: ffffed1015d2707c
R10: ffffed1015d2707b R11: ffff8880ae9383db R12: ffff8880a689bc5e
R13: 1ffff920001b5153 R14: 0000000000000030 R15: ffffc90000da8af8
FS: 00007fd7a42be700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32338000 CR3: 00000000a928c000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
hsr_handle_frame+0x1c5/0x630 net/hsr/hsr_slave.c:31
__netif_receive_skb_core+0xfbc/0x30b0 net/core/dev.c:5099
__netif_receive_skb_one_core+0xa8/0x1a0 net/core/dev.c:5196
__netif_receive_skb+0x2c/0x1d0 net/core/dev.c:5312
process_backlog+0x206/0x750 net/core/dev.c:6144
napi_poll net/core/dev.c:6582 [inline]
net_rx_action+0x508/0x1120 net/core/dev.c:6650
__do_softirq+0x262/0x98c kernel/softirq.c:292
do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
</IRQ>

Fixes: c5a759117210 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:
"The rest of MM and the rest of everything else: hotfixes, ipc, misc,
  procfs, lib, cleanups, arm"

* emailed patches from Andrew Morton <[email protected]>: (67 commits)
  ARM: dma-api: fix max_pfn off-by-one error in __dma_supported()
  treewide: remove redundant IS_ERR() before error code check
  include/linux/cpumask.h: don't calculate length of the input string
  lib: new testcases for bitmap_parse{_user}
  lib: rework bitmap_parse()
  lib: make bitmap_parse_user a wrapper on bitmap_parse
  lib: add test for bitmap_parse()
  bitops: more BITS_TO_* macros
  lib/string: add strnchrnul()
  proc: convert everything to "struct proc_ops"
  proc: decouple proc from VFS with "struct proc_ops"
  asm-generic/tlb: provide MMU_GATHER_TABLE_FREE
  asm-generic/tlb: rename HAVE_MMU_GATHER_NO_GATHER
  asm-generic/tlb: rename HAVE_MMU_GATHER_PAGE_SIZE
  asm-generic/tlb: rename HAVE_RCU_TABLE_FREE
  asm-generic/tlb: add missing CONFIG symbol
  asm-gemeric/tlb: remove stray function declarations
  asm-generic/tlb: avoid potential double flush
  mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush
  powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
  ...

Merge tag 'drm-next-2020-02-04' of git://anongit.freedesktop.org/drm/drm

Pull drm ttm/mm updates from Dave Airlie:
"Thomas Hellstrom has some more changes to the TTM layer that needed a
  patch to the mm subsystem.

  This adds a new mm API vmf_insert_mixed_prot to avoid an ugly hack
  that has limitations in the TTM layer"

* tag 'drm-next-2020-02-04' of git://anongit.freedesktop.org/drm/drm:
  mm, drm/ttm: Fix vm page protection handling
  mm: Add a vmf_insert_mixed_prot() function

Merge tag 'tag-chrome-platform-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome platform updates from Benson Leung:
"CrOS EC:

   - Refactoring of some of cros_ec's headers:

     include/linux/mfd/cros_ec.h now removed, new cros_ec.h added to
     drivers/platform/chrome which contains shared operations of cros_ec
     transport drivers.

   - Response tracing in cros_ec_proto

  Wilco EC:

   - Fix unregistration order.

   - Fix keyboard backlight probing on systems without keyboard
     backlight

   - Minor cleanup (newlines in printks, COMPILE_TEST)

  Misc:

   - chromeos_laptop converted to use i2c_new_scanned_device instead of
     i2c_new_probed_device"

* tag 'tag-chrome-platform-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: cros_ec: Match implementation with headers
  platform/chrome: cros_ec: Drop unaligned.h include
  platform/chrome: wilco_ec: Allow wilco to be compiled in COMPILE_TEST
  platform/chrome: wilco_ec: Add newlines to printks
  platform/chrome: wilco_ec: Fix unregistration order
  cros_ec: treewide: Remove 'include/linux/mfd/cros_ec.h'
  platform/chrome: cros_ec_ishtp: Make init_lock static
  platform/chrome: chromeos_laptop: Convert to i2c_new_scanned_device
  platform/chrome: cros_ec_lpc: Use platform_get_irq_optional() for optional IRQs
  platform/chrome: cros_ec_proto: Add response tracing
  platform/chrome: cros_ec_trace: Match trace commands with EC commands

clk: qcom: Use ARRAY_SIZE in videocc-sc7180 for parent clocks

It's nicer to use ARRAY_SIZE instead of hardcoding. Had we always
been doing this it would have prevented a previous bug. See commit
74c31ff9c84a ("clk: qcom: gpu_cc_gmu_clk_src has 5 parents, not 6").

Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lkml.kernel.org/r/20200203103049.v4.13.If37e4b1b5553ac9db5ea51e84a6eec286cdf209e@changeid
Signed-off-by: Stephen Boyd <[email protected]>

clk: qcom: Get rid of the test clock for videocc-sc7180

The test clock isn't in the bindings and apparently it's not used by
anyone upstream. Remove it.

Suggested-by: Stephen Boyd <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lkml.kernel.org/r/20200203103049.v4.12.Ifd19a2701a102ec9f04e61a09345198383a9e937@changeid
Signed-off-by: Stephen Boyd <[email protected]>

dt-bindings: clock: Cleanup qcom,videocc bindings for sdm845/sc7180

This makes the qcom,videocc bindings match the recent changes to the
dispcc and gpucc.

1. Switched to using "bi_tcxo" instead of "xo".

2. Adds a description for the XO clock.  Not terribly important but
   nice if it cleanly matches its cousins.

3. Updates the example to use the symbolic name for the RPMH clock and
   also show that the real devices are currently using 2 address cells
   / size cells and fixes the spacing on the closing brace.

4. Split into 2 files.  In this case they could probably share one
   file, but let's be consistent.

Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lkml.kernel.org/r/20200203103049.v4.11.I27bbd90045f38cd3218c259526409d52a48efb35@changeid
Signed-off-by: Stephen Boyd <[email protected]>

clk: qcom: Use ARRAY_SIZE in gpucc-sc7180 for parent clocks

It's nicer to use ARRAY_SIZE instead of hardcoding. Had we always
been doing this it would have prevented a previous bug. See commit
74c31ff9c84a ("clk: qcom: gpu_cc_gmu_clk_src has 5 parents, not 6").

Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lkml.kernel.org/r/20200203103049.v4.10.I3bf44e33f4dc7ecca10a50dbccb7dc082894fa59@changeid
Signed-off-by: Stephen Boyd <[email protected]>

clk: qcom: Get rid of the test clock for gpucc-sc7180

The test clock isn't in the bindings and apparently it's not used by
anyone upstream. Remove it.

Suggested-by: Stephen Boyd <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lkml.kernel.org/r/20200203103049.v4.9.I6d5276b768f6593053be036a3e70cce298d39f0c@changeid
Signed-off-by: Stephen Boyd <[email protected]>

dt-bindings: clock: Fix qcom,gpucc bindings for sdm845/sc7180/msm8998

The qcom,gpucc bindings had a few problems with them:

1. When things were converted to yaml the name of the "gpll0 main"
   clock got changed from "gpll0" to "gpll0_main".  Change it back for
   msm8998.

2. Apparently there is a push not to use purist aliases for clocks but
   instead to just use the internal Qualcomm names.  For sdm845 and
   sc7180 (where the drivers haven't already been changed) move in
   this direction.

Things were also getting complicated harder to deal with by jamming
several SoCs into one file.  Splitting simplifies things.

Fixes: 5c6f3a36b913 ("dt-bindings: clock: Add YAML schemas for the QCOM GPUCC clock bindings")
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lkml.kernel.org/r/20200203103049.v4.7.I513cd73b16665065ae6c22cf594d8b543745e28c@changeid
Signed-off-by: Stephen Boyd <[email protected]>

clk: qcom: Use ARRAY_SIZE in dispcc-sc7180 for parent clocks

It's nicer to use ARRAY_SIZE instead of hardcoding. Had we always
been doing this it would have prevented a previous bug. See commit
74c31ff9c84a ("clk: qcom: gpu_cc_gmu_clk_src has 5 parents, not 6").

Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lkml.kernel.org/r/20200203103049.v4.6.If590c468722d2985cea63adf60c0d2b3098f37d9@changeid
Signed-off-by: Stephen Boyd <[email protected]>

clk: qcom: Get rid of the test clock for dispcc-sc7180

The test clock isn't in the bindings and apparently it's not used by
anyone upstream. Remove it.

Suggested-by: Stephen Boyd <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lkml.kernel.org/r/20200203103049.v4.5.I28ac8f801456f1b950f7da10ed0f74a1344d4a35@changeid
Signed-off-by: Stephen Boyd <[email protected]>

clk: qcom: Get rid of fallback global names for dispcc-sc7180

In the new world input clocks should be matched by ".fw_name". sc7180
is new enough that no backward compatibility use of global names
should be needed. Remove it.

With a proper device tree and downstream display patches I have
verified booting a sc7180 up and seeing the display after this patch.

Fixes: dd3d06622138 ("clk: qcom: Add display clock controller driver for SC7180")
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lkml.kernel.org/r/20200203103049.v4.4.Ia3706a5d5add72e88dbff60fd13ec06bf7a2fd48@changeid
Signed-off-by: Stephen Boyd <[email protected]>

dt-bindings: clock: Fix qcom,dispcc bindings for sdm845/sc7180

The qcom,dispcc bindings had a few problems with them:

1. They didn't specify all the clocks that dispcc is a client of.
   Specifically on sc7180 there are two clocks from the DSI PHY and
   two from the DP PHY.  On sdm845 there are actually two DSI PHYs
   (each of which has two clocks) and an extra clock from the gcc.
   These all need to be specified.

2. The sdm845.dtsi has existed for quite some time without specifying
   the clocks.  The Linux driver was relying on global names to match
   things up.  While we should transition things, it should be noted
   in the bindings.

3. The names used the bindings for "xo" and "gpll0" didn't match the
   names that QC used for these clocks internally and this was causing
   confusion / difficulty with their code generation tools.  Switched
   to the internal names to simplify everyone's lives.  It's not quite
   as clean in a purist sense but it should avoid headaches.  This
   officially changes the binding, but that seems OK in this case.

Also note that I updated the example.

Fixes: 5d28e44ba630 ("dt-bindings: clock: Add YAML schemas for the QCOM DISPCC clock bindings")
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lkml.kernel.org/r/20200203103049.v4.2.I0c4bbb0f75a0880cd4bd90d8b267271e2375e0d0@changeid
Signed-off-by: Stephen Boyd <[email protected]>

clk: qcom: rcg2: Don't crash if our parent can't be found; return an error

When I got my clock parenting slightly wrong I ended up with a crash
that looked like this:

  Unable to handle kernel NULL pointer dereference at virtual
  address 0000000000000000
  ...
  pc : clk_hw_get_rate+0x14/0x44
  ...
  Call trace:
   clk_hw_get_rate+0x14/0x44
   _freq_tbl_determine_rate+0x94/0xfc
   clk_rcg2_determine_rate+0x2c/0x38
   clk_core_determine_round_nolock+0x4c/0x88
   clk_core_round_rate_nolock+0x6c/0xa8
   clk_core_round_rate_nolock+0x9c/0xa8
   clk_core_set_rate_nolock+0x70/0x180
   clk_set_rate+0x3c/0x6c
   of_clk_set_defaults+0x254/0x360
   platform_drv_probe+0x28/0xb0
   really_probe+0x120/0x2dc
   driver_probe_device+0x64/0xfc
   device_driver_attach+0x4c/0x6c
   __driver_attach+0xac/0xc0
   bus_for_each_dev+0x84/0xcc
   driver_attach+0x2c/0x38
   bus_add_driver+0xfc/0x1d0
   driver_register+0x64/0xf8
   __platform_driver_register+0x4c/0x58
   msm_drm_register+0x5c/0x60
   ...

It turned out that clk_hw_get_parent_by_index() was returning NULL and
we weren't checking.  Let's check it so that we don't crash.

Fixes: ac269395cdd8 ("clk: qcom: Convert to clk_hw based provider APIs")
Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Matthias Kaehlcke <[email protected]>
Link: https://lkml.kernel.org/r/20200203103049.v4.1.I7487325fe8e701a68a07d3be8a6a4b571eca9cfa@changeid
Signed-off-by: Stephen Boyd <[email protected]>

clk: ls1028a: fix a dereference of pointer 'parent' before a null check

Currently the pointer 'parent' is being dereferenced before it is
being null checked. Fix this by performing the null check before
it is dereferenced.

Addresses-Coverity: ("Dereference before null check")
Fixes: d37010a3c162 ("clk: ls1028a: Add clock driver for Display output interface")
Signed-off-by: Colin Ian King <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>

Merge tag 'rtc-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
"The VL_READ and VL_CLR ioctls have been reworked to be more useful.
  This will not break userspace as there are very few users and they are
  using the integer value as a boolean.

  Apart from that, two drivers were reworked and a few fixes here and
  there for a net reduction of number of lines.

  Summary:

  Subsystem:
   - the VL_READ and VL_CLR ioctls are now documented and their behavior
     is unified across all the drivers.
   - RTC_I2C_AND_SPI Kconfig option rework to avoid selecting both
     REGMAP_I2C and REGMAP_SPI unecessarily.

  Drivers:
   - at91rm9200: remove deprecated procfs, add sam9x60, sama5d4 and
     sama5d2 compatibles.
   - cmos: solve lost interrupts issue on MS Surface 3
   - hym8563: return proper errno when time is invalid
   - rv3029: many fixes, nvram support"

* tag 'rtc-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (63 commits)
  dt-bindings: rtc: at91rm9200: document clocks property
  rtc: i2c/spi: Avoid inclusion of REGMAP support when not needed
  rtc: Kconfig: select REGMAP_I2C when necessary
  rtc: Kconfig: properly indent sd3078 entry
  rtc: cmos: Refactor code by using the new dmi_get_bios_year() helper
  rtc: cmos: Use predefined value for RTC IRQ on legacy x86
  rtc: cmos: Stop using shared IRQ
  rtc: tps6586x: Use IRQ_NOAUTOEN flag
  rtc: at91rm9200: use FIELD_PREP/FIELD_GET
  rtc: at91rm9200: avoid time readout in at91_rtc_setalarm
  rtc: at91rm9200: move register definitions to C file
  rtc: at91rm9200: add sama5d4 and sama5d2 compatibles
  dt-bindings: rtc: at91rm9200: convert bindings to json-schema
  rtc: at91rm9200: remove procfs information
  dt-bindings: atmel, at91rm9200-rtc: add microchip, sam9x60-rtc
  rtc: pcf8563: Use BIT
  rtc: moxart: Convert to SPDX identifier
  rtc: ds1343: Remove unused struct spi_device in struct ds1343_priv
  rtc: rx8025: Remove struct i2c_client from struct rx8025_data
  rtc: hym8563: Read the valid flag directly instead of caching it
  ...

ARM: dma-api: fix max_pfn off-by-one error in __dma_supported()

max_pfn, as set in arch/arm/mm/init.c:

    static void __init find_limits(unsigned long *min,
   unsigned long *max_low,
   unsigned long *max_high)
    {
    *max_low = PFN_DOWN(memblock_get_current_limit());
    *min = PFN_UP(memblock_start_of_DRAM());
    *max_high = PFN_DOWN(memblock_end_of_DRAM());
    }

with memblock_end_of_DRAM() pointing to the next byte after DRAM.  As
such, max_pfn points to the PFN after the end of DRAM.

Thus when using max_pfn to check DMA masks, we should subtract one when
checking DMA ranges against it.

Commit 8bf1268f48ad ("ARM: dma-api: fix off-by-one error in
__dma_supported()") fixed the same issue, but missed this spot.

This issue was found while working on the sun4i-csi v4l2 driver on the
Allwinner R40 SoC.  On Allwinner SoCs, DRAM is offset at 0x40000000, and
we are starting to use of_dma_configure() with the "dma-ranges" property
in the device tree to have the DMA API handle the offset.

In this particular instance, dma-ranges was set to the same range as the
actual available (2 GiB) DRAM.  The following error appeared when the
driver attempted to allocate a buffer:

    sun4i-csi 1c09000.csi: Coherent DMA mask 0x7fffffff (pfn 0x40000-0xc0000)
    covers a smaller range of system memory than the DMA zone pfn 0x0-0xc0001
    sun4i-csi 1c09000.csi: dma_alloc_coherent of size 307200 failed

Fixing the off-by-one error makes things work.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 11a5aa32562e ("ARM: dma-mapping: check DMA mask against available memory")
Fixes: 9f28cde0bc64 ("ARM: another fix for the DMA mapping checks")
Fixes: ab746573c405 ("ARM: dma-mapping: allow larger DMA mask than supported")
Signed-off-by: Chen-Yu Tsai <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Russell King <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

treewide: remove redundant IS_ERR() before error code check

'PTR_ERR(p) == -E*' is a stronger condition than IS_ERR(p).
Hence, IS_ERR(p) is unneeded.

The semantic patch that generates this commit is as follows:

// <smpl>
@@
expression ptr;
constant error_code;
@@
-IS_ERR(ptr) && (PTR_ERR(ptr) == - error_code)
+PTR_ERR(ptr) == - error_code
// </smpl>

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Cc: Julia Lawall <[email protected]>
Acked-by: Stephen Boyd <[email protected]> [drivers/clk/clk.c]
Acked-by: Bartosz Golaszewski <[email protected]> [GPIO]
Acked-by: Wolfram Sang <[email protected]> [drivers/i2c]
Acked-by: Rafael J. Wysocki <[email protected]> [acpi/scan.c]
Acked-by: Rob Herring <[email protected]>
Cc: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/linux/cpumask.h: don't calculate length of the input string

New design of inner bitmap_parse() allows to avoid calculating the size of
a null-terminated string.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Amritha Nambiar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: "Tobin C . Harding" <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: new testcases for bitmap_parse{_user}

New version of bitmap_parse() is unified with bitmap_parse_list(),
and therefore:

- weakens rules on whitespaces and commas between hex chunks;

- in addition to

- allows passing UINT_MAX or any other big number as the length of input
string instead of actual string length.

The patch covers the cases.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Amritha Nambiar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: "Tobin C . Harding" <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: rework bitmap_parse()

bitmap_parse() is ineffective and full of opaque variables and opencoded
parts.  It leads to hard understanding and usage of it.  This rework
includes:

- remove bitmap_shift_left() call from the cycle.  Now it makes the
  complexity of the algorithm as O(nbits^2).  In the suggested approach
  the input string is parsed in reverse direction, so no shifts needed;

- relax requirement on a single comma and no white spaces between
  chunks.  It is considered useful in scripting, and it aligns with
  bitmap_parselist();

- split bitmap_parse() to small readable helpers;

- make an explicit calculation of the end of input line at the
  beginning, so users of the bitmap_parse() won't bother doing this.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Cc: Amritha Nambiar <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: "Tobin C . Harding" <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: make bitmap_parse_user a wrapper on bitmap_parse

Currently we parse user data byte after byte which leads to
overcomplicating of parsing algorithm. There are no performance critical
users of bitmap_parse_user(), and so we can duplicate user data to kernel
buffer and simply call bitmap_parselist(). This rework lets us unify and
simplify bitmap_parse() and bitmap_parse_user(), which is done in the
following patch.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Amritha Nambiar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: "Tobin C . Harding" <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: add test for bitmap_parse()

The test is derived from bitmap_parselist() NO_LEN is reserved for use in
following patches.

[[email protected]: fix rebase issue]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix address space when test user buffer]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Amritha Nambiar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: "Tobin C . Harding" <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

bitops: more BITS_TO_* macros

Introduce BITS_TO_U64, BITS_TO_U32 and BITS_TO_BYTES as they are handy in
the following patches (BITS_TO_U32 specifically). Reimplement tools/
version of the macros according to the kernel implementation.

Also fix indentation for BITS_PER_TYPE definition.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Amritha Nambiar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: "Tobin C . Harding" <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/string: add strnchrnul()

Patch series "lib: rework bitmap_parse", v5.

Similarl to the recently revisited bitmap_parselist(), bitmap_parse() is
ineffective and overcomplicated.  This series reworks it, aligns its
interface with bitmap_parselist() and makes it simpler to use.

The series also adds a test for the function and fixes usage of it in
cpumask_parse() according to the new design - drops the calculating of
length of an input string.

bitmap_parse() takes the array of numbers to be put into the map in the BE
order which is reversed to the natural LE order for bitmaps.  For example,
to construct bitmap containing a bit on the position 42, we have to put a
line '400,0'.  Current implementation reads chunk one by one from the
beginning ('400' before '0') and makes bitmap shift after each successful
parse.  It makes the complexity of the whole process as O(n^2).  We can do
it in reverse direction ('0' before '400') and avoid shifting, but it
requires reverse parsing helpers.

This patch (of 7):

New function works like strchrnul() with a length limited string.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Amritha Nambiar <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: "Tobin C . Harding" <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Steffen Klassert <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc: convert everything to "struct proc_ops"

The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
seq_file.h.

Conversion rule is:

llseek => proc_lseek
unlocked_ioctl => proc_ioctl

xxx => proc_xxx

delete ".owner = THIS_MODULE" line

[[email protected]: fix drivers/isdn/capi/kcapi_proc.c]
[[email protected]: fix kernel/sched/psi.c]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc: decouple proc from VFS with "struct proc_ops"

Currently core /proc code uses "struct file_operations" for custom hooks,
however, VFS doesn't directly call them.  Every time VFS expands
file_operations hook set, /proc code bloats for no reason.

Introduce "struct proc_ops" which contains only those hooks which /proc
allows to call into (open, release, read, write, ioctl, mmap, poll).  It
doesn't contain module pointer as well.

Save ~184 bytes per usage:

add/remove: 26/26 grow/shrink: 1/4 up/down: 1922/-6674 (-4752)
Function                                     old     new   delta
sysvipc_proc_ops                               -      72     +72
...
config_gz_proc_ops                             -      72     +72
proc_get_inode                               289     339     +50
proc_reg_get_unmapped_area                   110     107      -3
close_pdeo                                   227     224      -3
proc_reg_open                                289     284      -5
proc_create_data                              60      53      -7
rt_cpu_seq_fops                              256       -    -256
...
default_affinity_proc_fops                   256       -    -256
Total: Before=5430095, After=5425343, chg -0.09%

Link: http://lkml.kernel.org/r/20191225172228.GA13378@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

asm-generic/tlb: provide MMU_GATHER_TABLE_FREE

As described in the comment, the correct order for freeing pages is:

1) unhook page
2) TLB invalidate page
3) free page

This order equally applies to page directories.

Currently there are two correct options:

- use tlb_remove_page(), when all page directores are full pages and
   there are no futher contraints placed by things like software
   walkers (HAVE_FAST_GUP).

- use MMU_GATHER_RCU_TABLE_FREE and tlb_remove_table() when the
   architecture does not do IPI based TLB invalidate and has
   HAVE_FAST_GUP (or software TLB fill).

This however leaves architectures that don't have page based directories
but don't need RCU in a bind.  For those, provide MMU_GATHER_TABLE_FREE,
which provides the independent batching for directories without the
additional RCU freeing.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

asm-generic/tlb: rename HAVE_MMU_GATHER_NO_GATHER

Towards a more consistent naming scheme.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

asm-generic/tlb: rename HAVE_MMU_GATHER_PAGE_SIZE

Towards a more consistent naming scheme.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

asm-generic/tlb: rename HAVE_RCU_TABLE_FREE

Towards a more consistent naming scheme.

[[email protected]: fix sparc64 Kconfig]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

asm-generic/tlb: add missing CONFIG symbol

Without this the symbol will not actually end up in .config files.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: a30e32bd79e9 ("asm-generic/tlb: Provide generic tlb_flush() based on flush_tlb_mm()")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

asm-gemeric/tlb: remove stray function declarations

We removed the actual functions a while ago.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1808d65b55e4 ("asm-generic/tlb: Remove arch_tlb*_mmu()")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

asm-generic/tlb: avoid potential double flush

Aneesh reported that:

tlb_flush_mmu()
  tlb_flush_mmu_tlbonly()
    tlb_flush() <-- #1
  tlb_flush_mmu_free()
    tlb_table_flush()
      tlb_table_invalidate()
tlb_flush_mmu_tlbonly()
  tlb_flush() <-- #2

does two TLBIs when tlb->fullmm, because __tlb_reset_range() will not
clear tlb->end in that case.

Observe that any caller to __tlb_adjust_range() also sets at least one of
the tlb->freed_tables || tlb->cleared_p* bits, and those are
unconditionally cleared by __tlb_reset_range().

Change the condition for actually issuing TLBI to having one of those bits
set, as opposed to having tlb->end != 0.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reported-by: "Aneesh Kumar K.V" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush

Architectures for which we have hardware walkers of Linux page table
should flush TLB on mmu gather batch allocation failures and batch flush.
Some architectures like POWER supports multiple translation modes (hash
and radix) and in the case of POWER only radix translation mode needs the
above TLBI.  This is because for hash translation mode kernel wants to
avoid this extra flush since there are no hardware walkers of linux page
table.  With radix translation, the hardware also walks linux page table
and with that, kernel needs to make sure to TLB invalidate page walk cache
before page table pages are freed.

More details in commit d86564a2f085 ("mm/tlb, x86/mm: Support invalidating
TLB caches for RCU_TABLE_FREE")

The changes to sparc are to make sure we keep the old behavior since we
are now removing HAVE_RCU_TABLE_NO_INVALIDATE.  The default value for
tlb_needs_table_invalidate is to always force an invalidate and sparc can
avoid the table invalidate.  Hence we define tlb_needs_table_invalidate to
false for sparc architecture.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Michael Ellerman <[email protected]> [powerpc]
Cc: <[email protected]> [4.14+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case

Patch series "Fixup page directory freeing", v4.

This is a repost of patch series from Peter with the arch specific changes
except ppc64 dropped.  ppc64 changes are added here because we are redoing
the patch series on top of ppc64 changes.  This makes it easy to backport
these changes.  Only the first 2 patches need to be backported to stable.

The thing is, on anything SMP, freeing page directories should observe the
exact same order as normal page freeing:

1) unhook page/directory
2) TLB invalidate
3) free page/directory

Without this, any concurrent page-table walk could end up with a
Use-after-Free.  This is esp.  trivial for anything that has software
page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware
caches partial page-walks (ie.  caches page directories).

Even on UP this might give issues since mmu_gather is preemptible these
days.  An interrupt or preempted task accessing user pages might stumble
into the free page if the hardware caches page directories.

This patch series fixes ppc64 and add generic MMU_GATHER changes to
support the conversion of other architectures.  I haven't added patches
w.r.t other architecture because they are yet to be acked.

This patch (of 9):

A followup patch is going to make sure we correctly invalidate page walk
cache before we free page table pages.  In order to keep things simple
enable RCU_TABLE_FREE even for !SMP so that we don't have to fixup the
!SMP case differently in the followup patch

!SMP case is right now broken for radix translation w.r.t page walk
cache flush.  We can get interrupted in between page table free and
that would imply we have page walk cache entries pointing to tables
which got freed already.  Michael said "both our platforms that run on
Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a
problem for anyone in practice, unless they've hacked their kernel to
build it !SMP."

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86: mm: avoid allocating struct mm_struct on the stack

struct mm_struct is quite large (~1664 bytes) and so allocating on the
stack may cause problems as the kernel stack size is small.

Since ptdump_walk_pgd_level_core() was only allocating the structure so
that it could modify the pgd argument we can instead introduce a pgd
override in struct mm_walk and pass this down the call stack to where it
is needed.

Since the correct mm_struct is now being passed down, it is now also
unnecessary to take the mmap_sem semaphore because ptdump_walk_pgd() will
now take the semaphore on the real mm.

[[email protected]: restore missed arm64 changes]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: ptdump: reduce level numbers by 1 in note_page()

Rather than having to increment the 'depth' number by 1 in ptdump_hole(),
let's change the meaning of 'level' in note_page() since that makes the
code simplier.

Note that for x86, the level numbers were previously increased by 1 in
commit 45dcd2091363 ("x86/mm/dump_pagetables: Fix printout of p4d level")
and the comment "Bit 7 has a different meaning" was not updated, so this
change also makes the code match the comment again.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arm64: mm: display non-present entries in ptdump

Previously the /sys/kernel/debug/kernel_page_tables file would only show
lines for entries present in the page tables. However it is useful to
also show non-present entries as this makes the size and level of the
holes more visible. This aligns the behaviour with x86 which also shows
holes.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arm64: mm: convert mm/dump.c to use walk_page_range()

Now walk_page_range() can walk kernel page tables, we can switch the arm64
ptdump code over to using it, simplifying the code.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86: mm: convert dump_pagetables to use walk_page_range

Make use of the new functionality in walk_page_range to remove the arch
page walking code and use the generic code to walk the page tables.

The effective permissions are passed down the chain using new fields in
struct pg_state.

The KASAN optimisation is implemented by setting action=CONTINUE in the
callbacks to skip an entire tree of entries.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: add generic ptdump

Add a generic version of page table dumping that architectures can opt-in
to.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86: mm: convert ptdump_walk_pgd_level_debugfs() to take an mm_struct

To enable x86 to use the generic walk_page_range() function, the callers
of ptdump_walk_pgd_level_debugfs() need to pass in the mm_struct.

This means that ptdump_walk_pgd_level_core() is now always passed a valid
pgd, so drop the support for pgd==NULL.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86: mm+efi: convert ptdump_walk_pgd_level() to take a mm_struct

To enable x86 to use the generic walk_page_range() function, the callers
of ptdump_walk_pgd_level() need to pass an mm_struct rather than the raw
pgd_t pointer. Luckily since commit 7e904a91bf60 ("efi: Use efi_mm in x86
as well as ARM") we now have an mm_struct for EFI on x86.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86: mm: point to struct seq_file from struct pg_state

mm/dump_pagetables.c passes both struct seq_file and struct pg_state down
the chain of walk_*_level() functions to be passed to note_page().
Instead place the struct seq_file in struct pg_state and access it from
struct pg_state (which is private to this file) in note_page().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: pagewalk: add 'depth' parameter to pte_hole

The pte_hole() callback is called at multiple levels of the page tables.
Code dumping the kernel page tables needs to know what at what depth the
missing entry is.  Add this is an extra parameter to pte_hole().  When the
depth isn't know (e.g.  processing a vma) then -1 is passed.

The depth that is reported is the actual level where the entry is missing
(ignoring any folding that is in place), i.e.  any levels where
PTRS_PER_P?D is set to 1 are ignored.

Note that depth starts at 0 for a PGD so that PUD/PMD/PTE retain their
natural numbers as levels 2/3/4.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Tested-by: Zong Li <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: pagewalk: fix termination condition in walk_pte_range()

If walk_pte_range() is called with a 'end' argument that is beyond the
last page of memory (e.g. ~0UL) then the comparison between 'addr' and
'end' will always fail and the loop will be infinite. Instead change the
comparison to >= while accounting for overflow.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: pagewalk: don't lock PTEs for walk_page_range_novma()

walk_page_range_novma() can be used to walk page tables or the kernel or
for firmware. These page tables may contain entries that are not backed
by a struct page and so it isn't (in general) possible to take the PTE
lock for the pte_entry() callback. So update walk_pte_range() to only
take the lock when no_vma==false by splitting out the inner loop to a
separate function and add a comment explaining the difference to
walk_page_range_novma().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: pagewalk: allow walking without vma

Since 48684a65b4e3: "mm: pagewalk: fix misbehavior of walk_page_range for
vma(VM_PFNMAP)", page_table_walk() will report any kernel area as a hole,
because it lacks a vma.

This means each arch has re-implemented page table walking when needed,
for example in the per-arch ptdump walker.

Remove the requirement to have a vma in the generic code and add a new
function walk_page_range_novma() which ignores the VMAs and simply walks
the page tables.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: pagewalk: add p4d_entry() and pgd_entry()

pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were no
users.  We're about to add users so reintroduce them, along with
p4d_entry() as we now have 5 levels of tables.

Note that commit a00cc7d9dd93d66a ("mm, x86: add support for PUD-sized
transparent hugepages") already re-added pud_entry() but with different
semantics to the other callbacks.  This commit reverts the semantics back
to match the other callbacks.

To support hmm.c which now uses the new semantics of pud_entry() a new
member ('action') of struct mm_walk is added which allows the callbacks to
either descend (ACTION_SUBTREE, the default), skip (ACTION_CONTINUE) or
repeat the callback (ACTION_AGAIN).  hmm.c is then updated to call
pud_trans_huge_lock() itself and make use of the splitting/retry logic of
the core code.

After this change pud_entry() is called for all entries, not just
transparent huge pages.

[[email protected]: fix unused variable warning]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86: mm: add p?d_leaf() definitions

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For x86 we already have p?d_large() functions, so simply add macros to
provide the generic p?d_leaf() names for the generic code.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

sparc: mm: add p?d_leaf() definitions

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For sparc 64 bit, pmd_large() and pud_large() are already provided, so add
macros to provide the p?d_leaf names required by the generic code.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Price <[email protected]>
Acked-by: David S. Miller <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Hogan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Liang, Kan" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zong Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>