Eric Dumazet [Wed, 21 Dec 2016 13:42:43 +0000 (05:42 -0800)]
tcp: add a missing barrier in tcp_tasklet_func()
Madalin reported crashes happening in tcp_tasklet_func() on powerpc64
Before TSQ_QUEUED bit is cleared, we must ensure the changes done
by list_del(&tp->tsq_node); are committed to memory, otherwise
corruption might happen, as an other cpu could catch TSQ_QUEUED
clearance too soon.
We can notice that old kernels were immune to this bug, because
TSQ_QUEUED was cleared after a bh_lock_sock(sk)/bh_unlock_sock(sk)
section, but they could have missed a kick to write additional bytes,
when NIC interrupts for a given flow are spread to multiple cpus.
Affected TCP flows would need an incoming ACK or RTO timer to add more
packets to the pipe. So overall situation should be better now.
Linus Torvalds [Wed, 21 Dec 2016 18:59:34 +0000 (10:59 -0800)]
splice: reinstate SIGPIPE/EPIPE handling
Commit 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()")
caused a regression when there were no more readers left on a pipe that
was being spliced into: rather than the expected SIGPIPE and -EPIPE
return value, the writer would end up waiting forever for space to free
up (which obviously was not going to happen with no readers around).
Linus Torvalds [Wed, 21 Dec 2016 18:47:13 +0000 (10:47 -0800)]
Merge branch 'parisc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
- add Kernel address space layout randomization support
- re-enable interrupts earlier now that we have a working IRQ stack
- optimize the timer interrupt function to better cope with missed
timer irqs
- fix error return code in parisc perf code (by Dan Carpenter)
- fix PAT debug code
* 'parisc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Optimize timer interrupt function
parisc: perf: return -EFAULT on error
parisc: Enhance CPU detection code on PAT machines
parisc: Re-enable interrupts early
parisc: Enable KASLR
Linus Torvalds [Wed, 21 Dec 2016 18:40:30 +0000 (10:40 -0800)]
Merge tag 'nfs-for-4.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull more NFS client updates from Trond Myklebust:
"Highlights include:
- further attribute cache improvements to make revalidation more fine
grained
- NFSv4 locking improvements
Bugfixes:
- nfs4_fl_prepare_ds must be careful about reporting success in files
layout
- pNFS/flexfiles: Instead of marking a device inactive, remove it
from the cache"
* tag 'nfs-for-4.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Retry the DELEGRETURN if the embedded GETATTR is rejected with EACCES
NFS: Retry the CLOSE if the embedded GETATTR is rejected with EACCES
NFSv4: Place the GETATTR operation before the CLOSE
NFSv4: Also ask for attributes when downgrading to a READ-only state
NFS: Don't abuse NFS_INO_REVAL_FORCED in nfs_post_op_update_inode_locked()
pNFS: Return RW layouts on OPEN_DOWNGRADE
NFSv4: Add encode/decode of the layoutreturn op in OPEN_DOWNGRADE
NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
NFSv4: ensure __nfs4_find_lock_state returns consistent result.
NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.
pNFS/flexfiles: delete deviceid, don't mark inactive
NFS: Clean up nfs_attribute_timeout()
NFS: Remove unused function nfs_revalidate_inode_rcu()
NFS: Fix and clean up the access cache validity checking
NFS: Only look at the change attribute cache state in nfs_weak_revalidate()
NFS: Clean up cache validity checking
NFS: Don't revalidate the file on close if we hold a delegation
NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN
NFSv4: Update the attribute cache info in update_changeattr
Thomas Petazzoni [Wed, 21 Dec 2016 10:28:49 +0000 (11:28 +0100)]
net: mvpp2: fix dma unmapping of TX buffers for fragments
Since commit 71ce391dfb784 ("net: mvpp2: enable proper per-CPU TX
buffers unmapping"), we are not correctly DMA unmapping TX buffers for
fragments.
Indeed, the mvpp2_txq_inc_put() function only stores in the
txq_cpu->tx_buffs[] array the physical address of the buffer to be
DMA-unmapped when skb != NULL. In addition, when DMA-unmapping, we use
skb_headlen(skb) to get the size to be unmapped. Both of this works fine
for TX descriptors that are associated directly to a SKB, but not the
ones that are used for fragments, with a NULL pointer as skb:
- We have a NULL physical address when calling DMA unmap
- skb_headlen(skb) crashes because skb is NULL
This causes random crashes when fragments are used.
To solve this problem, we need to:
- Store the physical address of the buffer to be unmapped
unconditionally, regardless of whether it is tied to a SKB or not.
- Store the length of the buffer to be unmapped, which requires a new
field.
Instead of adding a third array to store the length of the buffer to be
unmapped, and as suggested by David Miller, this commit refactors the
tx_buffs[] and tx_skb[] arrays of 'struct mvpp2_txq_pcpu' into a
separate structure 'mvpp2_txq_pcpu_buf', to which a 'size' field is
added. Therefore, instead of having three arrays to allocate/free, we
have a single one, which also improve data locality, reducing the
impact on the CPU cache.
Heiko Stübner [Tue, 20 Dec 2016 16:17:06 +0000 (17:17 +0100)]
net: ethernet: stmmac: dwmac-rk: make clk enablement first in powerup
Right now the dwmac-rk tries to set up the GRF-specific speed and link
options before enabling clocks, phys etc and on previous socs this works
because the GRF is supplied on the whole by one clock.
On the rk3399 however the GRF (General Register Files) clock-supply
has been split into multiple clocks and while there is no specific
grf-gmac clock like for other sub-blocks, it seems the mac-specific
portions are actually supplied by the general mac clock.
This results in hangs on rk3399 boards if the driver is build as module.
When built in te problem of course doesn't surface, as the clocks
are of course still on at the stage before clock_disable_unused.
To solve this, simply move the clock enablement to the first position
in the powerup callback. This is also a good idea in general to
enable clocks before everything else.
Tested on rk3288, rk3368 and rk3399 the dwmac still works on all of them.
Kalesh A P [Tue, 20 Dec 2016 15:14:30 +0000 (10:14 -0500)]
be2net: Increase skb headroom size to 256 bytes
The driver currently allocates 128 bytes of skb headroom.
This was found to be insufficient with some configurations
like Geneve tunnels, which resulted in skb head reallocations.
Linus Torvalds [Wed, 21 Dec 2016 18:16:05 +0000 (10:16 -0800)]
Merge branch 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux
Pull scsi target cleanups from Bart Van Assche:
"The changes here are:
- a few small bug fixes for the iSCSI and user space target drivers.
- minimize the target build time by about 30% by rearranging #include
directives
- fix the second argument passed to percpu_ida_alloc()
- reduce the number of false positive warnings reported by sparse
These patches pass Wu Fengguang's build bot tests and also the
linux-next tests"
* 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux:
iscsi-target: Return error if unable to add network portal
target: Fix spelling mistake and unwrap multi-line text
target/iscsi: Fix double free in lio_target_tiqn_addtpg()
target/user: Fix use-after-free of tcmu_cmds if they are expired
target: Minimize #include directives
target/user: Add an #include directive
cxgbit: Add an #include directive
ibmvscsi_tgt: Add two #include directives
sbp-target: Add an #include directive
qla2xxx: Add an #include directive
configfs: Minimize #include directives
usb: gadget: Fix second argument of percpu_ida_alloc()
sbp-target: Fix second argument of percpu_ida_alloc()
target/user: Fix a data type in tcmu_queue_cmd()
target: Use NULL instead of 0 to represent a pointer
The commit was intended to cover the race condition, but it introduced
yet another regression for devices with the implicit feedback, leading
to a kernel panic due to NULL-dereference in an irq context.
As the race condition that was addressed by the commit is very rare
and the regression is much worse, let's revert the commit for rc1, and
fix the issue properly in a later patch.
Paul Moore [Wed, 21 Dec 2016 15:39:25 +0000 (10:39 -0500)]
selinux: use the kernel headers when building scripts/selinux
Commit 3322d0d64f4e ("selinux: keep SELinux in sync with new capability
definitions") added a check on the defined capabilities without
explicitly including the capability header file which caused problems
when building genheaders for users of clang/llvm. Resolve this by
using the kernel headers when building genheaders, which is arguably
the right thing to do regardless, and explicitly including the
kernel's capability.h header file in classmap.h. We also update the
mdp build, even though it wasn't causing an error we really should
be using the headers from the kernel we are building.
Larry Finger [Tue, 20 Dec 2016 02:38:12 +0000 (20:38 -0600)]
rtlwifi: Fix kernel oops introduced with commit e49656147359
With commit e49656147359 {"rtlwifi: Use dev_kfree_skb_irq instead of
kfree_skb"), the method used to free an skb was changed because the
kfree_skb() was inside a spinlock. What was forgotten is that kfree_skb()
guards against a NULL value for the argument. Routine dev_kfree_skb_irq()
does not, and a test is needed to prevent kernel panics.
Tobias Klausmann [Tue, 13 Dec 2016 17:08:07 +0000 (18:08 +0100)]
ath9k: do not return early to fix rcu unlocking
Starting with commit d94a461d7a7d ("ath9k: use ieee80211_tx_status_noskb
where possible") the driver uses rcu_read_lock() && rcu_read_unlock(), yet on
returning early in ath_tx_edma_tasklet() the unlock is missing leading to stalls
and suspicious RCU usage:
When we switch to virtual addresses and, especially after
reserve_initrd()->relocate_initrd() have run, we have the updated initrd
address in initrd_start. Use initrd_start then instead of the address
which has been passed to us through boot params. (That still gets used
when we're running the very early routines on the BSP).
Paul Burton [Fri, 11 Nov 2016 14:22:36 +0000 (14:22 +0000)]
mmc: sd: Meet alignment requirements for raw_ssr DMA
The mmc_read_ssr() function results in DMA to the raw_ssr member of
struct mmc_card, which is not guaranteed to be cache line aligned & thus
might not meet the requirements set out in Documentation/DMA-API.txt:
Warnings: Memory coherency operates at a granularity called the cache
line width. In order for memory mapped by this API to operate
correctly, the mapped region must begin exactly on a cache line
boundary and end exactly on one (to prevent two separately mapped
regions from sharing a single cache line). Since the cache line size
may not be known at compile time, the API will not enforce this
requirement. Therefore, it is recommended that driver writers who
don't take special care to determine the cache line size at run time
only map virtual regions that begin and end on page boundaries (which
are guaranteed also to be cache line boundaries).
On some systems where DMA is non-coherent this can lead to us losing
data that shares cache lines with the raw_ssr array.
Fix this by kmalloc'ing a temporary buffer to perform DMA into. kmalloc
will ensure the buffer is suitably aligned, allowing the DMA to be
performed without any loss of data.
s3c64xx_cpufreq_config_regulator is incorrectly annotated
as __init, since the caller is also not init:
WARNING: vmlinux.o(.text+0x92fe1c): Section mismatch in reference from the function s3c64xx_cpufreq_driver_init() to the function .init.text:s3c64xx_cpufreq_config_regulator()
With modern gcc versions, the function gets inline, so we don't
see the warning, this only happens with gcc-4.6 and older.
Boris Ostrovsky [Thu, 15 Dec 2016 15:00:58 +0000 (10:00 -0500)]
cpufreq: Remove CPU hotplug callbacks only if they were initialized
Since CPU hotplug callbacks are requested for CPUHP_AP_ONLINE_DYN state,
successful callback initialization will result in cpuhp_setup_state()
returning a positive value. Therefore acpi_cpufreq_online being zero
indicates that callbacks have not been installed.
This means that acpi_cpufreq_boost_exit() should only remove them if
acpi_cpufreq_online is positive. Trying to call
cpuhp_remove_state_nocalls(0) will cause a BUG().
Boris Ostrovsky [Thu, 15 Dec 2016 15:00:57 +0000 (10:00 -0500)]
CPU/hotplug: Clarify description of __cpuhp_setup_state() return value
When ivoked with CPUHP_AP_ONLINE_DYN state __cpuhp_setup_state()
is expected to return positive value which is the hotplug state that
the routine assigns.
Since all users are cleaned up, remove the 2 deprecated APIs due to no
users.
As a Linux variable rather than an ACPICA variable, acpi_gbl_permanent_mmap
is renamed to acpi_permanent_mmap to have a consistent coding style across
entire Linux ACPI subsystem.
This patch removes the users of the deprectated APIs:
acpi_get_table_with_size()
early_acpi_os_unmap_memory()
The following APIs should be used instead of:
acpi_get_table()
acpi_put_table()
The deprecated APIs are invented to be a replacement of acpi_get_table()
during the early stage so that the early mapped pointer will not be stored
in ACPICA core and thus the late stage acpi_get_table() won't return a
wrong pointer. The mapping size is returned just because it is required by
early_acpi_os_unmap_memory() to unmap the pointer during early stage.
But as the mapping size equals to the acpi_table_header.length
(see acpi_tb_init_table_descriptor() and acpi_tb_validate_table()), when
such a convenient result is returned, driver code will start to use it
instead of accessing acpi_table_header to obtain the length.
Thus this patch cleans up the drivers by replacing returned table size with
acpi_table_header.length, and should be a no-op.
FADT parsing code requires FADT to be installed as
ACPI_TABLE_ORIGIN_INTERNAL_PHYSICAL, using new
acpi_tb_get_table()/acpi_tb_put_table(), other address types can also be allowed,
thus facilitates FADT customization with virtual address. Lv Zheng.
This patch back ports Linux acpi_get_table_with_size() and
early_acpi_os_unmap_memory() into ACPICA upstream to reduce divergences.
The 2 APIs are used by Linux as table management APIs for long time, it
contains a hidden logic that during the early stage, the mapped tables
should be unmapped before the early stage ends.
During the early stage, tables are handled by the following sequence:
acpi_get_table_with_size();
parse the table
early_acpi_os_unmap_memory();
During the late stage, tables are handled by the following sequence:
acpi_get_table();
parse the table
Linux uses acpi_gbl_permanent_mmap to distinguish the early stage and the
late stage.
The reasoning of introducing acpi_get_table_with_size() is: ACPICA will
remember the early mapped pointer in acpi_get_table() and Linux isn't able to
prevent ACPICA from using the wrong early mapped pointer during the late
stage as there is no API provided from ACPICA to be an inverse of
acpi_get_table() to forget the early mapped pointer.
But how ACPICA can work with the early/late stage requirement? Inside of
ACPICA, tables are ensured to be remained in "INSTALLED" state during the
early stage, and they are carefully not transitioned to "VALIDATED" state
until the late stage. So the same logic is in fact implemented inside of
ACPICA in a different way. The gap is only that the feature is not provided
to the OSPMs in an accessible external API style.
It then is possible to fix the gap by providing an inverse of
acpi_get_table() from ACPICA, so that the two Linux sequences can be
combined:
acpi_get_table();
parse the table
acpi_put_table();
In order to work easier with the current Linux code, acpi_get_table() and
acpi_put_table() is implemented in a usage counting based style:
1. When the usage count of the table is increased from 0 to 1, table is
mapped and .Pointer is set with the mapping address (VALIDATED);
2. When the usage count of the table is decreased from 1 to 0, .Pointer
is unset and the mapping address is unmapped (INVALIDATED).
So that we can deploy the new APIs to Linux with minimal effort by just
invoking acpi_get_table() in acpi_get_table_with_size() and invoking
acpi_put_table() in early_acpi_os_unmap_memory(). Lv Zheng.
Pull networking fixes and cleanups from David Miller:
1) Use rb_entry() instead of hardcoded container_of(), from Geliang
Tang.
2) Use correct memory barriers in stammac driver, from Pavel Machek.
3) Fix assoc bind address handling in SCTP, from Xin Long.
4) Make the length check for UFO handling consistent between
__ip_append_data() and ip_finish_output(), from Zheng Li.
5) HSI driver compatible strings were busted fro hix5hd2, from Dongpo
Li.
6) Handle devm_ioremap() errors properly in cavium driver, from Arvind
Yadav.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
RDS: use rb_entry()
net_sched: sch_netem: use rb_entry()
net_sched: sch_fq: use rb_entry()
net/mlx5: use rb_entry()
ethernet: sfc: Add Kconfig entry for vendor Solarflare
sctp: not copying duplicate addrs to the assoc's bind address list
sctp: reduce indent level in sctp_copy_local_addr_list
ARM: dts: hix5hd2: don't change the existing compatible string
net: hix5hd2_gmac: fix compatible strings name
openvswitch: Add a missing break statement.
net: netcp: ethss: fix 10gbe host port tx pri map configuration
net: netcp: ethss: fix errors in ethtool ops
fsl/fman: enable compilation on ARM64
fsl/fman: A007273 only applies to PPC SoCs
powerpc: fsl/fman: remove fsl,fman from of_device_ids[]
fsl/fman: fix 1G support for QSGMII interfaces
dt: bindings: net: use boolean dt properties for eee broken modes
net: phy: use boolean dt properties for eee broken modes
net: phy: fix sign type error in genphy_config_eee_advert
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
...
Linus Torvalds [Tue, 20 Dec 2016 23:24:32 +0000 (15:24 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge final set of updates from Andrew Morton:
- a series to make IMA play better across kexec
- a handful of random fixes
* emailed patches from Andrew Morton <[email protected]>:
printk: fix typo in CONSOLE_LOGLEVEL_DEFAULT help text
ratelimit: fix WARN_ON_RATELIMIT return value
kcov: make kcov work properly with KASLR enabled
arm64: setup: introduce kaslr_offset()
mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
ima: platform-independent hash value
ima: define a canonical binary_runtime_measurements list format
ima: support restoring multiple template formats
ima: store the builtin/custom template definitions in a list
ima: on soft reboot, save the measurement list
powerpc: ima: send the kexec buffer to the next kernel
ima: maintain memory size needed for serializing the measurement list
ima: permit duplicate measurement list entries
ima: on soft reboot, restore the measurement list
powerpc: ima: get the kexec buffer passed by the previous kernel
Linus Torvalds [Tue, 20 Dec 2016 23:17:55 +0000 (15:17 -0800)]
Merge tag 'doc-4.10-3' of git://git.lwn.net/linux
Pull documentation fix from Jonathan Corbet:
"A single fix for the build system.
It would appear that the docutils developers, in their wisdom, broke
the API in the 0.13 release. This fix detects the breakage and allows
the docs to be built with both the old and new versions"
* tag 'doc-4.10-3' of git://git.lwn.net/linux:
docs: sphinx-extensions: make rstFlatTable work with docutils 0.13
Linus Torvalds [Tue, 20 Dec 2016 23:16:00 +0000 (15:16 -0800)]
Merge tag 'microblaze-4.10-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Pull arch/microblaze updates from Michal Simek:
- wire-up new syscalls
- add new codes and fpga families
- fix a return value
* tag 'microblaze-4.10-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Add new fpga families
microblaze: Add missing release version code v9.6 and v10
microblaze: Add missing syscalls
microblaze: Fix return value from xilinx_timer_init
- clean up arch/xtensa/kernel/setup.c: move S32C1I self-test out of it,
remove unused declarations, fix screen_info definition
* tag 'xtensa-20161219' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: update DMA-related Documentation/features entries
xtensa: configure shared DMA pool reservation in kc705 DTS
xtensa: enable HAVE_DMA_CONTIGUOUS
xtensa: move S32C1I self-test to a separate file
xtensa: fix screen_info, clean up unused declarations in setup.c
Helge Deller [Tue, 20 Dec 2016 19:51:10 +0000 (20:51 +0100)]
parisc: Optimize timer interrupt function
Restructure the timer interrupt function to better cope with missed timer irqs.
Optimize the calculation when the next interrupt should happen and skip irqs if
they would happen too shortly after exit of the irq function.
The update_process_times() call is done anyway at every timer irq, so we can
safely drop the prof_counter and prof_multiplier variables from the per_cpu
structure.
Tobias Klauser [Tue, 20 Dec 2016 13:38:26 +0000 (14:38 +0100)]
ethernet: sfc: Add Kconfig entry for vendor Solarflare
Since commit
5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver")
there are two drivers for Solarflare devices, but both still show up
directly beneath "Ethernet driver support" in the Kconfig. Follow the
pattern of other vendors and group them beneath an own vendor Kconfig
entry for Solarflare.
Xin Long [Tue, 20 Dec 2016 05:49:50 +0000 (13:49 +0800)]
sctp: not copying duplicate addrs to the assoc's bind address list
sctp.local_addr_list is a global address list that is supposed to include
all the local addresses. sctp updates this list according to NETDEV_UP/
NETDEV_DOWN notifications.
However, if multiple NICs have the same address, the global list would
have duplicate addresses. Even if for one NIC, promote secondaries in
__inet_del_ifa can also lead to accumulating duplicate addresses.
When sctp binds address 'ANY' and creates a connection, it copies all
the addresses from global list into asoc's bind addr list, which makes
sctp pack the duplicate addresses into INIT/INIT_ACK packets.
This patch is to filter the duplicate addresses when copying the addrs
from global list in sctp_copy_local_addr_list and unpacking addr_param
from cookie in sctp_raw_to_bind_addrs to asoc's bind addr list.
Note that we can't filter the duplicate addrs when global address list
gets updated, As NETDEV_DOWN event may remove an addr that still exists
in another NIC.
David S. Miller [Tue, 20 Dec 2016 19:12:30 +0000 (14:12 -0500)]
Merge branch 'hix5hd2_gmac-compatible-string'
Dongpo Li says:
====================
net: hix5hd2_gmac: keep the compatible string not changed
This patch series fix the patch: d0fb6ba75dc0 ("net: hix5hd2_gmac: add generic compatible string")
The SoC hix5hd2 compatible string has the suffix "-gmac" and
we should not change its compatible string.
So we should name all the compatible string with the suffix "-gmac".
Creating a new name suffix "-gemac" is unnecessary.
====================
Dongpo Li [Tue, 20 Dec 2016 02:09:29 +0000 (10:09 +0800)]
ARM: dts: hix5hd2: don't change the existing compatible string
The SoC hix5hd2 compatible string has the suffix "-gmac" and
we should not change it.
We should only add the generic compatible string "hisi-gmac-v1".
Fixes: 0855950ba580 ("ARM: dts: hix5hd2: add gmac generic compatible and clock names") Signed-off-by: Dongpo Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Dongpo Li [Tue, 20 Dec 2016 02:09:28 +0000 (10:09 +0800)]
net: hix5hd2_gmac: fix compatible strings name
The SoC hix5hd2 compatible string has the suffix "-gmac" and
we should not change its compatible string.
So we should name all the compatible string with the suffix "-gmac".
Creating a new name suffix "-gemac" is unnecessary.
We also add another SoC compatible string in dt binding documentation
and describe which generic version the SoC belongs to.
Fixes: d0fb6ba75dc0 ("net: hix5hd2_gmac: add generic compatible string") Signed-off-by: Dongpo Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jarno Rajahalme [Tue, 20 Dec 2016 01:06:33 +0000 (17:06 -0800)]
openvswitch: Add a missing break statement.
Add a break statement to prevent fall-through from
OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL. Without the break
actions setting ethernet addresses fail to validate with log messages
complaining about invalid tunnel attributes.
WingMan Kwok [Mon, 19 Dec 2016 22:55:56 +0000 (17:55 -0500)]
net: netcp: ethss: fix errors in ethtool ops
In ethtool ops, it needs to retrieve the corresponding
ethss module (gbe or xgbe) from the net_device structure.
Prior to this patch, the retrieving procedure only
checks for the gbe module. This patch fixes the issue
by checking the xgbe module if the net_device structure
does not correspond to the gbe module.
David S. Miller [Tue, 20 Dec 2016 18:55:35 +0000 (13:55 -0500)]
Merge branch 'fsl-fixes'
Madalin Bucur says:
====================
fsl/fman: fixes for ARM
The patch set fixes advertised speeds for QSGMII interfaces, disables A007273 erratum workaround on non-PowerPC platforms where it does not
apply, enables compilation on ARM64 and addresses a probing issue on
non PPC platforms.
Changes from v3: removed redundant comment, added ack by Scott
Changes from v2: merged fsl/fman changes to avoid a point of failure
Changes from v1: unifying probing on all supported platforms
====================
David S. Miller [Tue, 20 Dec 2016 18:50:51 +0000 (13:50 -0500)]
Merge branch 'phy-broken-modes'
Jerome Brunet says:
====================
phy: Fix integration of eee-broken-modes
The purpose of this series is to fix the integration of the ethernet phy
property "eee-broken-modes" [0]
The v3 of this series has been merged, missing a fix (error reported by
kbuild robot) available in the v4 [1]
More importantly, Florian opposed adding a DT property mapping a device
register this directly [2]. The concern was that the property could be
abused to implement platform configuration policy. After discussing it,
I think we agreed that such information about the HW (defect) should appear
in the platform DT. However, the preferred way is to add a boolean property
for each EEE broken mode.
jbrunet [Mon, 19 Dec 2016 15:05:38 +0000 (16:05 +0100)]
dt: bindings: net: use boolean dt properties for eee broken modes
The patches regarding eee-broken-modes was merged before all people
involved could find an agreement on the best way to move forward.
While we agreed on having a DT property to mark particular modes as broken,
the value used for eee-broken-modes mapped the phy register in very direct
way. Because of this, the concern is that it could be used to implement
configuration policies instead of describing a broken HW.
In the end, having a boolean property for each mode seems to be preferred
over one bit field value mapping the register (too) directly.
jbrunet [Mon, 19 Dec 2016 15:05:37 +0000 (16:05 +0100)]
net: phy: use boolean dt properties for eee broken modes
The patches regarding eee-broken-modes was merged before all people
involved could find an agreement on the best way to move forward.
While we agreed on having a DT property to mark particular modes as broken,
the value used for eee-broken-modes mapped the phy register in very direct
way. Because of this, the concern is that it could be used to implement
configuration policies instead of describing a broken HW.
In the end, having a boolean property for each mode seems to be preferred
over one bit field value mapping the register (too) directly.
jbrunet [Mon, 19 Dec 2016 15:05:36 +0000 (16:05 +0100)]
net: phy: fix sign type error in genphy_config_eee_advert
In genphy_config_eee_advert, the return value of phy_read_mmd_indirect is
checked to know if the register could be accessed but the result is
assigned to a 'u32'.
Changing to 'int' to correctly get errors from phy_read_mmd_indirect.
Fixes: d853d145ea3e ("net: phy: add an option to disable EEE advertisement") Reported-by: Julia Lawall <[email protected]> Signed-off-by: Jerome Brunet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Johannes Weiner [Tue, 20 Dec 2016 00:23:03 +0000 (16:23 -0800)]
mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
When FADV_DONTNEED cannot drop all pages in the range, it observes that
some pages might still be on per-cpu LRU caches after recent
instantiation and so initiates remote calls to all CPUs to flush their
local caches. However, in most cases, the fadvise happens from the same
context that instantiated the pages, and any pre-LRU pages in the
specified range are most likely sitting on the local CPU's LRU cache,
and so in many cases this results in unnecessary remote calls, which, in
a loaded system, can hold up the fadvise() call significantly.
[ I didn't record it in the extreme case we observed at Facebook,
unfortunately. We had a slow-to-respond system and noticed it
lru_add_drain_all() leading the profile during fadvise calls. This
patch came out of thinking about the code and how we commonly call
FADV_DONTNEED.
FWIW, I wrote a silly directory tree walker/searcher that recurses
through /usr to read and FADV_DONTNEED each file it finds. On a 2
socket 40 ht machine, over 1% is spent in lru_add_drain_all(). With
the patch, that cost is gone; the local drain cost shows at 0.09%. ]
Try to avoid the remote call by flushing the local LRU cache before even
attempting to invalidate anything. It's a cheap operation, and the
local LRU cache is the most likely to hold any pre-LRU pages in the
specified fadvise range.
Andreas Steffen [Tue, 20 Dec 2016 00:23:00 +0000 (16:23 -0800)]
ima: platform-independent hash value
For remote attestion it is important for the ima measurement values to
be platform-independent. Therefore integer fields to be hashed must be
converted to canonical format.
Mimi Zohar [Tue, 20 Dec 2016 00:22:57 +0000 (16:22 -0800)]
ima: define a canonical binary_runtime_measurements list format
The IMA binary_runtime_measurements list is currently in platform native
format.
To allow restoring a measurement list carried across kexec with a
different endianness than the targeted kernel, this patch defines
little-endian as the canonical format. For big endian systems wanting
to save/restore the measurement list from a system with a different
endianness, a new boot command line parameter named "ima_canonical_fmt"
is defined.
Considerations: use of the "ima_canonical_fmt" boot command line option
will break existing userspace applications on big endian systems
expecting the binary_runtime_measurements list to be in platform native
format.
Mimi Zohar [Tue, 20 Dec 2016 00:22:54 +0000 (16:22 -0800)]
ima: support restoring multiple template formats
The configured IMA measurement list template format can be replaced at
runtime on the boot command line, including a custom template format.
This patch adds support for restoring a measuremement list containing
multiple builtin/custom template formats.
Mimi Zohar [Tue, 20 Dec 2016 00:22:51 +0000 (16:22 -0800)]
ima: store the builtin/custom template definitions in a list
The builtin and single custom templates are currently stored in an
array. In preparation for being able to restore a measurement list
containing multiple builtin/custom templates, this patch stores the
builtin and custom templates as a linked list. This will permit
defining more than one custom template per boot.
Mimi Zohar [Tue, 20 Dec 2016 00:22:48 +0000 (16:22 -0800)]
ima: on soft reboot, save the measurement list
The TPM PCRs are only reset on a hard reboot. In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement
list of the running kernel must be saved and restored on boot.
This patch uses the kexec buffer passing mechanism to pass the
serialized IMA binary_runtime_measurements to the next kernel.
Mimi Zohar [Tue, 20 Dec 2016 00:22:38 +0000 (16:22 -0800)]
ima: permit duplicate measurement list entries
Measurements carried across kexec need to be added to the IMA
measurement list, but should not prevent measurements of the newly
booted kernel from being added to the measurement list. This patch adds
support for allowing duplicate measurements.
The "boot_aggregate" measurement entry is the delimiter between soft
boots.
Mimi Zohar [Tue, 20 Dec 2016 00:22:35 +0000 (16:22 -0800)]
ima: on soft reboot, restore the measurement list
The TPM PCRs are only reset on a hard reboot. In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement
list of the running kernel must be saved and restored on boot. This
patch restores the measurement list.
powerpc: ima: get the kexec buffer passed by the previous kernel
Patch series "ima: carry the measurement list across kexec", v8.
The TPM PCRs are only reset on a hard reboot. In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement
list of the running kernel must be saved and then restored on the
subsequent boot, possibly of a different architecture.
The existing securityfs binary_runtime_measurements file conveniently
provides a serialized format of the IMA measurement list. This patch
set serializes the measurement list in this format and restores it.
Up to now, the binary_runtime_measurements was defined as architecture
native format. The assumption being that userspace could and would
handle any architecture conversions. With the ability of carrying the
measurement list across kexec, possibly from one architecture to a
different one, the per boot architecture information is lost and with it
the ability of recalculating the template digest hash. To resolve this
problem, without breaking the existing ABI, this patch set introduces
the boot command line option "ima_canonical_fmt", which is arbitrarily
defined as little endian.
The need for this boot command line option will be limited to the
existing version 1 format of the binary_runtime_measurements.
Subsequent formats will be defined as canonical format (eg. TPM 2.0
support for larger digests).
A simplified method of Thiago Bauermann's "kexec buffer handover" patch
series for carrying the IMA measurement list across kexec is included in
this patch set. The simplified method requires all file measurements be
taken prior to executing the kexec load, as subsequent measurements will
not be carried across the kexec and restored.
This patch (of 10):
The IMA kexec buffer allows the currently running kernel to pass the
measurement list via a kexec segment to the kernel that will be kexec'd.
The second kernel can check whether the previous kernel sent the buffer
and retrieve it.
This is the architecture-specific part which enables IMA to receive the
measurement list passed by the previous kernel. It will be used in the
next patch.
The change in machine_kexec_64.c is to factor out the logic of removing
an FDT memory reservation so that it can be used by remove_ima_buffer.
zheng li [Mon, 12 Dec 2016 01:56:05 +0000 (09:56 +0800)]
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
There is an inconsistent conditional judgement in __ip_append_data and
ip_finish_output functions, the variable length in __ip_append_data just
include the length of application's payload and udp header, don't include
the length of ip header, but in ip_finish_output use
(skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ip header.
That causes some particular application's udp payload whose length is
between (MTU - IP Header) and MTU were fragmented by ip_fragment even
though the rst->dev support UFO feature.
Add the length of ip header to length in __ip_append_data to keep
consistent conditional judgement as ip_finish_output for ip fragment.
Joe Stringer [Fri, 9 Dec 2016 02:46:20 +0000 (18:46 -0800)]
samples/bpf: Move open_raw_sock to separate header
This function was declared in libbpf.c and was the only remaining
function in this library, but has nothing to do with BPF. Shift it out
into a new header, sock_example.h, and include it from the relevant
samples.
[root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
<SNIP>
HOSTCC samples/bpf/fds_example.o
/git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
/git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
insns, insns_cnt, "GPL", 0,
^~~~~
In file included from /git/linux/samples/bpf/libbpf.h:5:0,
from /git/linux/samples/bpf/bpf_load.h:4,
from /git/linux/samples/bpf/fds_example.c:15:
/git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
^~~~~~~~~~~~~~~~
HOSTCC samples/bpf/sockex1_user.o
So just ditch that 'const' to reduce build noise, leaving changing the
bpf_load_program() bpf_insn parameter to const to a later patch, if deemed
adequate.
Joe Stringer [Wed, 14 Dec 2016 22:05:26 +0000 (14:05 -0800)]
tools lib bpf: Add bpf_prog_{attach,detach}
Commit d8c5b17f2bc0 ("samples: bpf: add userspace example for attaching
eBPF programs to cgroups") added these functions to samples/libbpf, but
during this merge all of the samples libbpf functionality is shifting to
tools/lib/bpf. Shift these functions there.
Committer notes:
Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
like the other wrapper calls to sys_bpf with bpf_attr to make this build
in older toolchais, such as the ones in CentOS 5 and 6.
Joe Stringer [Wed, 14 Dec 2016 22:43:39 +0000 (14:43 -0800)]
samples/bpf: Switch over to libbpf
Now that libbpf under tools/lib/bpf/* is synced with the version from
samples/bpf, we can get rid most of the libbpf library here.
Committer notes:
Built it in a docker fedora rawhide container and ran it in the f25 host, seems
to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
doesn't seem to have introduced problems and Joe said he tested it with
all the entries in samples/bpf/ and other code he found:
[root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install
<SNIP>
[root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
[root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
make[1]: Entering directory '/tmp/build/linux'
CHK include/config/kernel.release
HOSTCC scripts/basic/fixdep
GEN ./Makefile
CHK include/generated/uapi/linux/version.h
Using /git/linux as source for kernel
CHK include/generated/utsrelease.h
HOSTCC scripts/basic/bin2c
HOSTCC arch/x86/tools/relocs_32.o
HOSTCC arch/x86/tools/relocs_64.o
LD samples/bpf/built-in.o
<SNIP>
HOSTCC samples/bpf/fds_example.o
HOSTCC samples/bpf/sockex1_user.o
/git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
/git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
insns, insns_cnt, "GPL", 0,
^~~~~
In file included from /git/linux/samples/bpf/libbpf.h:5:0,
from /git/linux/samples/bpf/bpf_load.h:4,
from /git/linux/samples/bpf/fds_example.c:15:
/git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
^~~~~~~~~~~~~~~~
HOSTCC samples/bpf/sockex2_user.o
<SNIP>
HOSTCC samples/bpf/xdp_tx_iptunnel_user.o
clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
-Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
HOSTLD samples/bpf/tc_l2_redirect
<SNIP>
HOSTLD samples/bpf/lwt_len_hist
HOSTLD samples/bpf/xdp_tx_iptunnel
make[1]: Leaving directory '/tmp/build/linux'
[root@f5065a7d6272 linux]#
And then, in the host:
[root@jouet bpf]# mount | grep "docker.*devicemapper\/"
/dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
[root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
[root@jouet bpf]# file offwaketime
offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
[root@jouet bpf]# readelf -SW offwaketime
offwaketime offwaketime_kern.o offwaketime_user.o
[root@jouet bpf]# readelf -SW offwaketime_kern.o
There are 11 section headers, starting at offset 0x700:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 000658 0000a8 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
[ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8
[ 4] .relkprobe/try_to_wake_up REL 0000000000000000 0005a8 000020 10 10 3 8
[ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8
[ 6] .reltracepoint/sched/sched_switch REL 0000000000000000 0005c8 000090 10 10 5 8
[ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4
[ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1
[ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4
[10] .symtab SYMTAB 0000000000000000 000488 000120 18 1 4 8
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
[root@jouet bpf]# ./offwaketime | head -3
qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4
firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1
swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61
[root@jouet bpf]#
Kan Liang [Tue, 13 Dec 2016 15:29:44 +0000 (10:29 -0500)]
perf diff: Do not overwrite valid build id
Fixes a perf diff regression issue which was introduced by commit 5baecbcd9c9a ("perf symbols: we can now read separate debug-info files
based on a build ID")
The binary name could be same when perf diff different binaries. Build
id is used to distinguish between them.
However, the previous patch assumes the same binary name has same build
id. So it overwrites the build id according to the binary name,
regardless of whether the build id is set or not.
Check the has_build_id in dso__load. If the build id is already set, use
it.
Ravi Bangoria [Tue, 22 Nov 2016 08:40:50 +0000 (14:10 +0530)]
perf annotate: Don't throw error for zero length symbols
'perf report --tui' exits with error when it finds a sample of zero
length symbol (i.e. addr == sym->start == sym->end). Actually these are
valid samples. Don't exit TUI and show report with such symbols.
Paulo Zanoni [Wed, 14 Dec 2016 14:55:37 +0000 (12:55 -0200)]
drm/i915: skip the first 4k of stolen memory on everything >= gen8
BSpec got updated and this workaround is now listed as standard
required programming for all subsequent projects. This is confirmed to
fix Skylake screen flickering issues (probably caused by the fact that
we initialized a ring in the first page of stolen, but I didn't 100%
confirm this theory).
v2: this is the patch that fixes the screen flickering, document it.
Chris Wilson [Mon, 19 Dec 2016 12:43:45 +0000 (12:43 +0000)]
drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:
i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)
Tvrtko Ursulin [Fri, 16 Dec 2016 13:18:42 +0000 (13:18 +0000)]
drm/i915: Fix use after free in logical_render_ring_init
Commit 3b3f1650b1ca ("drm/i915: Allocate intel_engine_cs
structure only for the enabled engines") introduced the
dynanically allocated engine instances and created an
potential use after free scenario in logical_render_ring_init
where lrc_destroy_wa_ctx_obj could be called after the engine
instance has been freed.
This can only happen during engine setup/init error handling
which luckily does not happen ever in practice.
Fix is to not call lrc_destroy_wa_ctx_obj since it would have
already been executed from the preceding engine cleanup.
Paulo Zanoni [Tue, 13 Dec 2016 20:57:44 +0000 (18:57 -0200)]
drm/i915: disable PSR by default on HSW/BDW
We've been ignoring the poor bugzilla reporters that say PSR causes
system lockups and all other sorts of problems. The earliest bug
report is from April, so I think we can use the "revert the offending
commit if no fixes are presented within 8 months" rule here.
Fixes: 9b58e352b463 ("drm/i915: Enable PSR by default on Haswell and Broadwell.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97602
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97515
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96736
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96704
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96569
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95176
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94985 Cc: <[email protected]> # v4.6+ Cc: Rodrigo Vivi <[email protected]> Cc: Jim Bride <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]> Acked-by: Rodrigo Vivi <[email protected]> Reviewed-by: Rodrigo Vivi <[email protected]> Acked-by: Jani Nikula <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 2ee7dc497e348eecbb82adbb1ea9e9a7e29fe921) Signed-off-by: Jani Nikula <[email protected]>
Mika Kuoppala [Wed, 14 Dec 2016 12:26:20 +0000 (14:26 +0200)]
drm/i915: Fix setting of boost freq tunable
For limiting the max frequency of gpu, the max freq tunable
is not enough to hard limit the max gap. We now have also per
client boost max freq. When this tunable was introduced,
it was mistakenly made read only. Allow user to gain control by
setting it writable.
Daniel Vetter [Tue, 13 Dec 2016 19:54:14 +0000 (20:54 +0100)]
drm/i915: tune down the fast link training vs boot fail
It's been unfixed since a while and no one is immediately working on
this. And we have the FIXME already. And now also a task in the DP
team's backlog.
Chris Wilson [Wed, 7 Dec 2016 13:34:11 +0000 (13:34 +0000)]
drm/i915: Reorder phys backing storage release
In commit a4f5ea64f0a8 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.
v2: Add commentary about always aligning to page size.