linux / backbone-sources

SSH Git

To clone this repository:

git clone git@git.backbone.ws:linux/backbone-sources.git

To push to this repository:

# Add a new remote
git remote add origin git@git.backbone.ws:linux/backbone-sources.git

# Push the master branch to the newly added origin, and configure
# this remote and branch as the default:
git push -u origin master

# From now on you can push master to the "origin" remote with:
git push

Diffs from 7b6b50b to ffa4e9b

Commits

Paolo Valente Add BFQ-v8r12 0531453 20 days ago
Paolo Valente Add extra checks related to entity scheduling 3307c31 20 days ago
Paolo Valente block, bfq: reset in_service_entity if it becomes idle c385e1c 20 days ago
Paolo Valente block, bfq: consider also in_service_entity to state whether an entity is active 8d3bd5f 20 days ago
Paolo Valente block, bfq: improve and refactor throughput-boosting logic 8930e6f 20 days ago
Paolo Valente FIRST BFQ-MQ COMMIT: Copy bfq-sq-iosched.c as bfq-mq-iosched.c f3ad13b 20 days ago
Paolo Valente Add config and build bits for bfq-mq-iosched df5060e 20 days ago
Paolo Valente Increase max policies for io controller 9a7b790 20 days ago
Paolo Valente Copy header file bfq.h as bfq-mq.h 948264e 20 days ago
Paolo Valente Move thinktime from bic to bfqq 227417b 20 days ago
Paolo Valente Embed bfq-ioc.c and add locking on request queue b6be3a3 20 days ago
Paolo Valente Modify interface and operation to comply with blk-mq-sched 81d3fb6 20 days ago
Paolo Valente Add checks and extra log messages - Part I a0f7c65 20 days ago
Paolo Valente Add lock check in bfq_allow_bio_merge b4080a2 20 days ago
Paolo Valente bfq-mq: execute exit_icq operations immediately e80cc1f 20 days ago
Paolo Valente Unnest request-queue and ioc locks from scheduler locks 21d41c4 20 days ago
Paolo Valente Add checks and extra log messages - Part II 8a755e3 20 days ago
Paolo Valente Fix unbalanced increment of rq_in_driver 68aac18 20 days ago
Paolo Valente Add checks and extra log messages - Part III 1e572a0 20 days ago
Paolo Valente TESTING: Check wrong invocation of merge and put_rq_priv functions e2a4ede 20 days ago
Paolo Valente Complete support for cgroups 2c34e56 20 days ago
Paolo Valente Remove all get and put of I/O contexts 628a2b6 20 days ago
Paolo Valente BUGFIX: Remove unneeded and deadlock-causing lock in request_merged 317857e 20 days ago
Paolo Valente Fix wrong unlikely 0bb1848 20 days ago
Paolo Valente Change cgroup params prefix to bfq-mq for bfq-mq d16baa0 20 days ago
Paolo Valente Add tentative extra tests on groups, reqs and queues 9015f60 20 days ago
Paolo Valente block, bfq-mq: access and cache blkg data only when safe 7967ed2 20 days ago
Paolo Valente bfq-mq: fix macro name in conditional invocation of policy_unregister a9b2436 20 days ago
Paolo Valente Port of "blk-mq-sched: unify request finished methods" ab4ec6b 20 days ago
Paolo Valente Port of "bfq-iosched: fix NULL ioc check in bfq_get_rq_private" 66cf587 20 days ago
Paolo Valente Port of "blk-mq-sched: unify request prepare methods" 0932ca1 20 days ago
Paolo Valente Add list of bfq instances to documentation a532e78 20 days ago
Paolo Valente bfq-sq: fix prefix of names of cgroups parameters a415726 20 days ago
Paolo Valente Add to documentation that bfq-mq and bfq-sq contain last fixes too f00d136 20 days ago
Paolo Valente Improve most frequently used no-logging path a94af0b 20 days ago
Paolo Valente bfq-sq: fix commit "Remove all get and put of I/O contexts" in branch bfq-mq 7d5757b 20 days ago
Paolo Valente bfq-sq-mq: make lookup_next_entity push up vtime on expirations aa56aa1 20 days ago
Paolo Valente bfq-sq-mq: remove direct switch to an entity in higher class e4aa534 20 days ago
Paolo Valente bfq-sq-mq: guarantee update_next_in_service always returns an eligible entity 7da10cc 20 days ago
Paolo Valente doc, block, bfq: fix some typos and stale sentences ea8c61f 20 days ago
Paolo Valente bfq-mq, bfq-sq: Disable writeback throttling 97db80e 20 days ago
Paolo Valente bfq-mq, bfq-sq: fix wrong init of saved start time for weight raising 20c5075 20 days ago
Paolo Valente Fix commit "Unnest request-queue and ioc locks from scheduler locks" c310ef7 20 days ago
Paolo Valente bfq-sq, bfq-mq: check and switch back to interactive wr also on queue split aada918 20 days ago
Paolo Valente bfq-sq, bfq-mq: let early-merged queues be weight-raised on split too 4719bad 20 days ago
Paolo Valente bfq-sq, bfq-mq: decrease burst size when queues in burst exit efc813b 20 days ago
Paolo Valente bfq-sq, bfq-mq: fix unbalanced decrements of burst size 6ba103a 20 days ago
Paolo Valente doc, block, bfq-mq: update max IOPS sustainable with BFQ d9fe8cc 20 days ago
Paolo Valente block, bfq-mq: add missing invocations of bfqg_stats_update_io_add/remove db40901 20 days ago
Paolo Valente block, bfq-mq: update blkio stats outside the scheduler lock cc23fea 20 days ago
Paolo Valente block, bfq-sq, bfq-mq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP 6713f1c 20 days ago
Paolo Valente block, bfq-mq: turn BUG_ON on request-size into WARN_ON 7bd365a 20 days ago
Paolo Valente block, bfq-sq, bfq-mq: consider also past I/O in soft real-time detection 1097d36 20 days ago
Paolo Valente block, bfq-mq: fix occurrences of request prepare/finish methods' old names 2a09b50 20 days ago
Paolo Valente block, bfq-sq, bfq-mq: add missing rq_pos_tree update on rq removal 4df1994 20 days ago
Paolo Valente block, bfq-sq, bfq-mq: check low_latency flag in bfq_bfqq_save_state() b844e34 20 days ago
Paolo Valente block, bfq-sq, bfq-mq: let a queue be merged only shortly after starting I/O 4cc6896 20 days ago
Paolo Valente block, bfq-sq, bfq-mq: remove superfluous check in queue-merging setup 157f39c 20 days ago
Paolo Valente block, bfq-sq, bfq-mq: increase threshold to deem I/O as random b82eb91 20 days ago
Paolo Valente block, bfq-sq, bfq-mq: specify usage condition of delta_us in bfq_log_bfqq call b739dda 20 days ago
Paolo Valente block, bfq-mq: limit tags for writes and async I/O ae4310c 20 days ago
Paolo Valente bfq-sq, bfq-mq: limit sectors served with interactive weight raising 402e5f6 20 days ago
Paolo Valente bfq-sq, bfq-mq: put async queues for root bfq groups too 59efebb 20 days ago
Paolo Valente bfq-sq, bfq-mq: release oom-queue ref to root group on exit 2dfbaaa 20 days ago
Paolo Valente block, bfq-sq, bfq-mq: trace get and put of bfq groups 13efe00 20 days ago
Paolo Valente bfq-sq, bfq-mq: compile group put for oom queue only if BFQ_GROUP_IOSCHED is set 816b77f 20 days ago
Paolo Valente block, bfq-sq, bfq-mq: remove trace_printks 643a89c 20 days ago
Paolo Valente block, bfq-mq: add requeue-request hook ce05027 12 days ago
Jan Alexander Steffens (heftig) Merge remote-tracking branch 'algodev/bfq-mq' into 4.15/bfq 54a4f78 12 days ago
Jan Alexander Steffens (heftig) Merge branch '4.15/bfq' into 4.15/master c7f90c2 12 days ago
Greg Kroah-Hartman KVM: x86: Make indirect calls in emulator speculation safe be88e93 12 days ago
Greg Kroah-Hartman KVM: VMX: Make indirect call speculation safe 96e1c36 12 days ago
Greg Kroah-Hartman module/retpoline: Warn about missing retpoline in module 2ce5583 12 days ago
Greg Kroah-Hartman x86/cpufeatures: Add CPUID_7_EDX CPUID leaf ad35224 12 days ago
Greg Kroah-Hartman x86/cpufeatures: Add Intel feature bits for Speculation Control 6acd374 12 days ago
Greg Kroah-Hartman x86/cpufeatures: Add AMD feature bits for Speculation Control c11a94a 12 days ago
Greg Kroah-Hartman x86/msr: Add definitions for new speculation control MSRs c32525a 12 days ago
Greg Kroah-Hartman x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown bcfd19e 12 days ago
Greg Kroah-Hartman x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes 727eca6 12 days ago
Greg Kroah-Hartman x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support c96b281 12 days ago
Greg Kroah-Hartman x86/alternative: Print unadorned pointers 739050a 12 days ago
Greg Kroah-Hartman x86/nospec: Fix header guards names 8810634 12 days ago
Greg Kroah-Hartman x86/bugs: Drop one "mitigation" from dmesg b635216 12 days ago
Greg Kroah-Hartman x86/cpu/bugs: Make retpoline module warning conditional d815b3a 12 days ago
Greg Kroah-Hartman x86/cpufeatures: Clean up Spectre v2 related CPUID flags 24516e9 12 days ago
Greg Kroah-Hartman x86/retpoline: Simplify vmexit_fill_RSB() 058840d 12 days ago
Greg Kroah-Hartman x86/speculation: Simplify indirect_branch_prediction_barrier() 0f6e6bc 12 days ago
Greg Kroah-Hartman auxdisplay: img-ascii-lcd: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE c7faead 12 days ago
Greg Kroah-Hartman iio: adc/accel: Fix up module licenses 39e8aa5 12 days ago
Greg Kroah-Hartman pinctrl: pxa: pxa2xx: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE 793cc74 12 days ago
Greg Kroah-Hartman ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE b053d9d 12 days ago
Greg Kroah-Hartman KVM: nVMX: Eliminate vmcs02 pool 81e19f1 12 days ago
Greg Kroah-Hartman KVM: VMX: introduce alloc_loaded_vmcs 3dcc781 12 days ago
Greg Kroah-Hartman objtool: Improve retpoline alternative handling 0603b36 12 days ago
Greg Kroah-Hartman objtool: Add support for alternatives at the end of a section 1e7c719 12 days ago
Greg Kroah-Hartman objtool: Warn on stripped section symbol dd12561 12 days ago
Greg Kroah-Hartman x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP 62c00e6 12 days ago
Greg Kroah-Hartman x86/spectre: Check CONFIG_RETPOLINE in command line parser 6ff25f6 12 days ago
Greg Kroah-Hartman x86/entry/64: Remove the SYSCALL64 fast path dd9708c 12 days ago
Greg Kroah-Hartman x86/entry/64: Push extra regs right away 6a35b18 12 days ago
Greg Kroah-Hartman x86/asm: Move 'status' from thread_struct to thread_info 6adfc96 12 days ago
Greg Kroah-Hartman Documentation: Document array_index_nospec a35f710 12 days ago
Greg Kroah-Hartman array_index_nospec: Sanitize speculative array de-references 8a1c71c 12 days ago
Greg Kroah-Hartman x86: Implement array_index_mask_nospec d9f2468 12 days ago
Greg Kroah-Hartman x86: Introduce barrier_nospec 7ec7f55 12 days ago
Greg Kroah-Hartman x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec fa46638 12 days ago
Greg Kroah-Hartman x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end} bd74e76 12 days ago
Greg Kroah-Hartman x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec d193324 12 days ago
Greg Kroah-Hartman x86/get_user: Use pointer masking to limit speculation 31d4cf7 12 days ago
Greg Kroah-Hartman x86/syscall: Sanitize syscall table de-references under speculation fecca49 12 days ago
Greg Kroah-Hartman vfs, fdtable: Prevent bounds-check bypass via speculative execution 64dab84 12 days ago
Greg Kroah-Hartman nl80211: Sanitize array index in parse_txq_params d583ef2 12 days ago
Greg Kroah-Hartman x86/spectre: Report get_user mitigation for spectre_v1 bdfaac0 12 days ago
Greg Kroah-Hartman x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable" 7aa1a17 12 days ago
Greg Kroah-Hartman x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel 9a417b0 12 days ago
Greg Kroah-Hartman x86/speculation: Use Indirect Branch Prediction Barrier in context switch 061c8e7 12 days ago
Greg Kroah-Hartman x86/paravirt: Remove 'noreplace-paravirt' cmdline option 6e33706 12 days ago
Greg Kroah-Hartman KVM: VMX: make MSR bitmaps per-VCPU b399b98 12 days ago
Greg Kroah-Hartman x86/kvm: Update spectre-v1 mitigation 9ec4cfc 12 days ago
Greg Kroah-Hartman x86/retpoline: Avoid retpolines for built-in __init functions 76e36de 12 days ago
Greg Kroah-Hartman x86/spectre: Simplify spectre_v2 command line parsing 28cf1d8 12 days ago
Greg Kroah-Hartman x86/pti: Mark constant arrays as __initconst d13d4d2 12 days ago
Greg Kroah-Hartman x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL 9e4d1de 12 days ago
Greg Kroah-Hartman KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX f13d175 12 days ago
Greg Kroah-Hartman KVM/x86: Add IBPB support 4659554 12 days ago
Greg Kroah-Hartman KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES 3d6e862 12 days ago
Greg Kroah-Hartman KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL 6d45809 12 days ago
Greg Kroah-Hartman KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL bad75ea 12 days ago
Greg Kroah-Hartman serial: core: mark port as initialized after successful IRQ change b796d30 12 days ago
Greg Kroah-Hartman fpga: region: release of_parse_phandle nodes after use 3531454 12 days ago
Greg Kroah-Hartman Linux 4.15.2 db22ec4 12 days ago
Jan Alexander Steffens (heftig) drivers: Fix some __read_overflow2 errors with -O3 inlining 0808134 12 days ago
Jan Alexander Steffens (heftig) Merge branch '4.15/misc' into 4.15/master 50df0c3 12 days ago
Jan Alexander Steffens (heftig) fixup! drivers: Fix some __read_overflow2 errors with -O3 inlining 5cffdb0 12 days ago
Jan Alexander Steffens (heftig) Merge branch '4.15/misc' into 4.15/master 659d348 12 days ago
Jan Alexander Steffens (heftig) Merge tag 'v4.15.2' into 4.15/master 01ae5f0 12 days ago
avatar Kolan Sh Linux 4.15 merged ae01e4e 11 days ago
avatar Kolan Sh Zen merged into 4.15 ffa4e9b 11 days ago

Summary

2742 2742 norandmaps Don't use address space randomization. Equivalent to
2743 2743 echo 0 > /proc/sys/kernel/randomize_va_space
2744 2744
2745 noreplace-paravirt [X86,IA-64,PV_OPS] Don't patch paravirt_ops
2746
2747 2745 noreplace-smp [X86-32,SMP] Don't replace SMP instructions
2748 2746 with UP alternatives
2749 2747
1 This document explains potential effects of speculation, and how undesirable
2 effects can be mitigated portably using common APIs.
3
4 ===========
5 Speculation
6 ===========
7
8 To improve performance and minimize average latencies, many contemporary CPUs
9 employ speculative execution techniques such as branch prediction, performing
10 work which may be discarded at a later stage.
11
12 Typically speculative execution cannot be observed from architectural state,
13 such as the contents of registers. However, in some cases it is possible to
14 observe its impact on microarchitectural state, such as the presence or
15 absence of data in caches. Such state may form side-channels which can be
16 observed to extract secret information.
17
18 For example, in the presence of branch prediction, it is possible for bounds
19 checks to be ignored by code which is speculatively executed. Consider the
20 following code:
21
22 int load_array(int *array, unsigned int index)
23 {
24 if (index >= MAX_ARRAY_ELEMS)
25 return 0;
26 else
27 return array[index];
28 }
29
30 Which, on arm64, may be compiled to an assembly sequence such as:
31
32 CMP <index>, #MAX_ARRAY_ELEMS
33 B.LT less
34 MOV <returnval>, #0
35 RET
36 less:
37 LDR <returnval>, [<array>, <index>]
38 RET
39
40 It is possible that a CPU mis-predicts the conditional branch, and
41 speculatively loads array[index], even if index >= MAX_ARRAY_ELEMS. This
42 value will subsequently be discarded, but the speculated load may affect
43 microarchitectural state which can be subsequently measured.
44
45 More complex sequences involving multiple dependent memory accesses may
46 result in sensitive information being leaked. Consider the following
47 code, building on the prior example:
48
49 int load_dependent_arrays(int *arr1, int *arr2, int index)
50 {
51 int val1, val2,
52
53 val1 = load_array(arr1, index);
54 val2 = load_array(arr2, val1);
55
56 return val2;
57 }
58
59 Under speculation, the first call to load_array() may return the value
60 of an out-of-bounds address, while the second call will influence
61 microarchitectural state dependent on this value. This may provide an
62 arbitrary read primitive.
63
64 ====================================
65 Mitigating speculation side-channels
66 ====================================
67
68 The kernel provides a generic API to ensure that bounds checks are
69 respected even under speculation. Architectures which are affected by
70 speculation-based side-channels are expected to implement these
71 primitives.
72
73 The array_index_nospec() helper in <linux/nospec.h> can be used to
74 prevent information from being leaked via side-channels.
75
76 A call to array_index_nospec(index, size) returns a sanitized index
77 value that is bounded to [0, size) even under cpu speculation
78 conditions.
79
80 This can be used to protect the earlier load_array() example:
81
82 int load_array(int *array, unsigned int index)
83 {
84 if (index >= MAX_ARRAY_ELEMS)
85 return 0;
86 else {
87 index = array_index_nospec(index, MAX_ARRAY_ELEMS);
88 return array[index];
89 }
90 }
1 1 # SPDX-License-Identifier: GPL-2.0
2 2 VERSION = 4
3 3 PATCHLEVEL = 15
4 SUBLEVEL = 1
4 SUBLEVEL = 2
5 5 EXTRAVERSION = -backbone
6 6 NAME = Fearless Coyote
7 7
21 21 #include <linux/export.h>
22 22 #include <linux/context_tracking.h>
23 23 #include <linux/user-return-notifier.h>
24 #include <linux/nospec.h>
24 25 #include <linux/uprobes.h>
25 26 #include <linux/livepatch.h>
26 27 #include <linux/syscalls.h>
207 207 * special case only applies after poking regs and before the
208 208 * very next return to user mode.
209 209 */
210 current->thread.status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
210 ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
211 211 #endif
212 212
213 213 user_enter_irqoff();
283 283 * regs->orig_ax, which changes the behavior of some syscalls.
284 284 */
285 285 if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
286 regs->ax = sys_call_table[nr & __SYSCALL_MASK](
286 nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
287 regs->ax = sys_call_table[nr](
287 288 regs->di, regs->si, regs->dx,
288 289 regs->r10, regs->r8, regs->r9);
289 290 }
306 306 unsigned int nr = (unsigned int)regs->orig_ax;
307 307
308 308 #ifdef CONFIG_IA32_EMULATION
309 current->thread.status |= TS_COMPAT;
309 ti->status |= TS_COMPAT;
310 310 #endif
311 311
312 312 if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) {
320 320 }
321 321
322 322 if (likely(nr < IA32_NR_syscalls)) {
323 nr = array_index_nospec(nr, IA32_NR_syscalls);
323 324 /*
324 325 * It's possible that a 32-bit syscall implementation
325 326 * takes a 64-bit parameter but nonetheless assumes that
252 252 * exist, overwrite the RSB with entries which capture
253 253 * speculative execution to prevent attack.
254 254 */
255 FILL_RETURN_BUFFER %ebx, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW
255 /* Clobbers %ebx */
256 FILL_RETURN_BUFFER RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW
256 257 #endif
257 258
258 259 /* restore callee-saved registers */
236 236 pushq %r9 /* pt_regs->r9 */
237 237 pushq %r10 /* pt_regs->r10 */
238 238 pushq %r11 /* pt_regs->r11 */
239 sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */
240 UNWIND_HINT_REGS extra=0
239 pushq %rbx /* pt_regs->rbx */
240 pushq %rbp /* pt_regs->rbp */
241 pushq %r12 /* pt_regs->r12 */
242 pushq %r13 /* pt_regs->r13 */
243 pushq %r14 /* pt_regs->r14 */
244 pushq %r15 /* pt_regs->r15 */
245 UNWIND_HINT_REGS
241 246
242 247 TRACE_IRQS_OFF
243 248
244 /*
245 * If we need to do entry work or if we guess we'll need to do
246 * exit work, go straight to the slow path.
247 */
248 movq PER_CPU_VAR(current_task), %r11
249 testl $_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
250 jnz entry_SYSCALL64_slow_path
251
252 entry_SYSCALL_64_fastpath:
253 /*
254 * Easy case: enable interrupts and issue the syscall. If the syscall
255 * needs pt_regs, we'll call a stub that disables interrupts again
256 * and jumps to the slow path.
257 */
258 TRACE_IRQS_ON
259 ENABLE_INTERRUPTS(CLBR_NONE)
260 #if __SYSCALL_MASK == ~0
261 cmpq $__NR_syscall_max, %rax
262 #else
263 andl $__SYSCALL_MASK, %eax
264 cmpl $__NR_syscall_max, %eax
265 #endif
266 ja 1f /* return -ENOSYS (already in pt_regs->ax) */
267 movq %r10, %rcx
268
269 /*
270 * This call instruction is handled specially in stub_ptregs_64.
271 * It might end up jumping to the slow path. If it jumps, RAX
272 * and all argument registers are clobbered.
273 */
274 #ifdef CONFIG_RETPOLINE
275 movq sys_call_table(, %rax, 8), %rax
276 call __x86_indirect_thunk_rax
277 #else
278 call *sys_call_table(, %rax, 8)
279 #endif
280 .Lentry_SYSCALL_64_after_fastpath_call:
281
282 movq %rax, RAX(%rsp)
283 1:
284
285 /*
286 * If we get here, then we know that pt_regs is clean for SYSRET64.
287 * If we see that no exit work is required (which we are required
288 * to check with IRQs off), then we can go straight to SYSRET64.
289 */
290 DISABLE_INTERRUPTS(CLBR_ANY)
291 TRACE_IRQS_OFF
292 movq PER_CPU_VAR(current_task), %r11
293 testl $_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
294 jnz 1f
295
296 LOCKDEP_SYS_EXIT
297 TRACE_IRQS_ON /* user mode is traced as IRQs on */
298 movq RIP(%rsp), %rcx
299 movq EFLAGS(%rsp), %r11
300 addq $6*8, %rsp /* skip extra regs -- they were preserved */
301 UNWIND_HINT_EMPTY
302 jmp .Lpop_c_regs_except_rcx_r11_and_sysret
303
304 1:
305 /*
306 * The fast path looked good when we started, but something changed
307 * along the way and we need to switch to the slow path. Calling
308 * raise(3) will trigger this, for example. IRQs are off.
309 */
310 TRACE_IRQS_ON
311 ENABLE_INTERRUPTS(CLBR_ANY)
312 SAVE_EXTRA_REGS
313 movq %rsp, %rdi
314 call syscall_return_slowpath /* returns with IRQs disabled */
315 jmp return_from_SYSCALL_64
316
317 entry_SYSCALL64_slow_path:
318 249 /* IRQs are off. */
319 SAVE_EXTRA_REGS
320 250 movq %rsp, %rdi
321 251 call do_syscall_64 /* returns with IRQs disabled */
322 252
323 return_from_SYSCALL_64:
324 253 TRACE_IRQS_IRETQ /* we're about to change IF */
325 254
326 255 /*
322 322 /* rcx and r11 are already restored (see code above) */
323 323 UNWIND_HINT_EMPTY
324 324 POP_EXTRA_REGS
325 .Lpop_c_regs_except_rcx_r11_and_sysret:
326 325 popq %rsi /* skip r11 */
327 326 popq %r10
328 327 popq %r9
352 352 USERGS_SYSRET64
353 353 END(entry_SYSCALL_64)
354 354
355 ENTRY(stub_ptregs_64)
356 /*
357 * Syscalls marked as needing ptregs land here.
358 * If we are on the fast path, we need to save the extra regs,
359 * which we achieve by trying again on the slow path. If we are on
360 * the slow path, the extra regs are already saved.
361 *
362 * RAX stores a pointer to the C function implementing the syscall.
363 * IRQs are on.
364 */
365 cmpq $.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
366 jne 1f
367
368 /*
369 * Called from fast path -- disable IRQs again, pop return address
370 * and jump to slow path
371 */
372 DISABLE_INTERRUPTS(CLBR_ANY)
373 TRACE_IRQS_OFF
374 popq %rax
375 UNWIND_HINT_REGS extra=0
376 jmp entry_SYSCALL64_slow_path
377
378 1:
379 JMP_NOSPEC %rax /* Called from C */
380 END(stub_ptregs_64)
381
382 .macro ptregs_stub func
383 ENTRY(ptregs_\func)
384 UNWIND_HINT_FUNC
385 leaq \func(%rip), %rax
386 jmp stub_ptregs_64
387 END(ptregs_\func)
388 .endm
389
390 /* Instantiate ptregs_stub for each ptregs-using syscall */
391 #define __SYSCALL_64_QUAL_(sym)
392 #define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
393 #define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
394 #include <asm/syscalls_64.h>
395
396 355 /*
397 356 * %rdi: prev task
398 357 * %rsi: next task
386 386 * exist, overwrite the RSB with entries which capture
387 387 * speculative execution to prevent attack.
388 388 */
389 FILL_RETURN_BUFFER %r12, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW
389 /* Clobbers %rbx */
390 FILL_RETURN_BUFFER RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW
390 391 #endif
391 392
392 393 /* restore callee-saved registers */
7 7 #include <asm/asm-offsets.h>
8 8 #include <asm/syscall.h>
9 9
10 #define __SYSCALL_64_QUAL_(sym) sym
11 #define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
12
13 #define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
10 #define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
14 11 #include <asm/syscalls_64.h>
15 12 #undef __SYSCALL_64
16 13
17 #define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
14 #define __SYSCALL_64(nr, sym, qual) [nr] = sym,
18 15
19 16 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
20 17
38 38 INDIRECT_THUNK(si)
39 39 INDIRECT_THUNK(di)
40 40 INDIRECT_THUNK(bp)
41 asmlinkage void __fill_rsb(void);
42 asmlinkage void __clear_rsb(void);
43
41 44 #endif /* CONFIG_RETPOLINE */
24 24 #define wmb() asm volatile("sfence" ::: "memory")
25 25 #endif
26 26
27 /**
28 * array_index_mask_nospec() - generate a mask that is ~0UL when the
29 * bounds check succeeds and 0 otherwise
30 * @index: array element index
31 * @size: number of elements in array
32 *
33 * Returns:
34 * 0 - (index < size)
35 */
36 static inline unsigned long array_index_mask_nospec(unsigned long index,
37 unsigned long size)
38 {
39 unsigned long mask;
40
41 asm ("cmp %1,%2; sbb %0,%0;"
42 :"=r" (mask)
43 :"r"(size),"r" (index)
44 :"cc");
45 return mask;
46 }
47
48 /* Override the default implementation from linux/nospec.h. */
49 #define array_index_mask_nospec array_index_mask_nospec
50
51 /* Prevent speculative execution past this barrier. */
52 #define barrier_nospec() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
53 "lfence", X86_FEATURE_LFENCE_RDTSC)
54
27 55 #ifdef CONFIG_X86_PPRO_FENCE
28 56 #define dma_rmb() rmb()
29 57 #else
29 29 CPUID_8000_000A_EDX,
30 30 CPUID_7_ECX,
31 31 CPUID_8000_0007_EBX,
32 CPUID_7_EDX,
32 33 };
33 34
34 35 #ifdef CONFIG_X86_FEATURE_NAMES
80 80 CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 15, feature_bit) || \
81 81 CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 16, feature_bit) || \
82 82 CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) || \
83 CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) || \
83 84 REQUIRED_MASK_CHECK || \
84 BUILD_BUG_ON_ZERO(NCAPINTS != 18))
85 BUILD_BUG_ON_ZERO(NCAPINTS != 19))
85 86
86 87 #define DISABLED_MASK_BIT_SET(feature_bit) \
87 88 ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 0, feature_bit) || \
103 103 CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 15, feature_bit) || \
104 104 CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 16, feature_bit) || \
105 105 CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) || \
106 CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) || \
106 107 DISABLED_MASK_CHECK || \
107 BUILD_BUG_ON_ZERO(NCAPINTS != 18))
108 BUILD_BUG_ON_ZERO(NCAPINTS != 19))
108 109
109 110 #define cpu_has(c, bit) \
110 111 (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \
13 13 /*
14 14 * Defines x86 CPU feature bits
15 15 */
16 #define NCAPINTS 18 /* N 32-bit words worth of info */
16 #define NCAPINTS 19 /* N 32-bit words worth of info */
17 17 #define NBUGINTS 1 /* N 32-bit bug flags */
18 18
19 19 /*
203 203 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
204 204 #define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */
205 205 #define X86_FEATURE_PTI ( 7*32+11) /* Kernel Page Table Isolation enabled */
206 #define X86_FEATURE_RETPOLINE ( 7*32+12) /* Generic Retpoline mitigation for Spectre variant 2 */
207 #define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* AMD Retpoline mitigation for Spectre variant 2 */
206 #define X86_FEATURE_RETPOLINE ( 7*32+12) /* "" Generic Retpoline mitigation for Spectre variant 2 */
207 #define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* "" AMD Retpoline mitigation for Spectre variant 2 */
208 208 #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
209 #define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */
210 #define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */
211 209
212 210 #define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */
213 #define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* Fill RSB on context switches */
211 #define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* "" Fill RSB on context switches */
214 212
213 #define X86_FEATURE_USE_IBPB ( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled */
214
215 215 /* Virtualization flags: Linux defined, word 8 */
216 216 #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
217 217 #define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
271 271 #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */
272 272 #define X86_FEATURE_IRPERF (13*32+ 1) /* Instructions Retired Count */
273 273 #define X86_FEATURE_XSAVEERPTR (13*32+ 2) /* Always save/restore FP error pointers */
274 #define X86_FEATURE_IBPB (13*32+12) /* Indirect Branch Prediction Barrier */
275 #define X86_FEATURE_IBRS (13*32+14) /* Indirect Branch Restricted Speculation */
276 #define X86_FEATURE_STIBP (13*32+15) /* Single Thread Indirect Branch Predictors */
274 277
275 278 /* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
276 279 #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */
321 321 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */
322 322 #define X86_FEATURE_SUCCOR (17*32+ 1) /* Uncorrectable error containment and recovery */
323 323 #define X86_FEATURE_SMCA (17*32+ 3) /* Scalable MCA */
324
325 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
326 #define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */
327 #define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
328 #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */
329 #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
330 #define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
324 331
325 332 /*
326 333 * BUG word(s)
77 77 #define DISABLED_MASK15 0
78 78 #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
79 79 #define DISABLED_MASK17 0
80 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
80 #define DISABLED_MASK18 0
81 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
81 82
82 83 #endif /* _ASM_X86_DISABLED_FEATURES_H */
137 137
138 138 extern void reserve_top_address(unsigned long reserve);
139 139
140 #define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
141 #define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
140 #define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
141 #define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
142 #define FIXADDR_TOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT)
143 #define FIXADDR_TOT_START (FIXADDR_TOP - FIXADDR_TOT_SIZE)
142 144
143 145 extern int fixmaps_set;
144 146
39 39
40 40 /* Intel MSRs. Some also available on other CPUs */
41 41
42 #define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
43 #define SPEC_CTRL_IBRS (1 << 0) /* Indirect Branch Restricted Speculation */
44 #define SPEC_CTRL_STIBP (1 << 1) /* Single Thread Indirect Branch Predictors */
45
46 #define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */
47 #define PRED_CMD_IBPB (1 << 0) /* Indirect Branch Prediction Barrier */
48
42 49 #define MSR_PPIN_CTL 0x0000004e
43 50 #define MSR_PPIN 0x0000004f
44 51
64 64 #define SNB_C3_AUTO_UNDEMOTE (1UL << 28)
65 65
66 66 #define MSR_MTRRcap 0x000000fe
67
68 #define MSR_IA32_ARCH_CAPABILITIES 0x0000010a
69 #define ARCH_CAP_RDCL_NO (1 << 0) /* Not susceptible to Meltdown */
70 #define ARCH_CAP_IBRS_ALL (1 << 1) /* Enhanced IBRS support */
71
67 72 #define MSR_IA32_BBL_CR_CTL 0x00000119
68 73 #define MSR_IA32_BBL_CR_CTL3 0x0000011e
69 74
214 214 * that some other imaginary CPU is updating continuously with a
215 215 * time stamp.
216 216 */
217 alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
218 "lfence", X86_FEATURE_LFENCE_RDTSC);
217 barrier_nospec();
219 218 return rdtsc();
220 219 }
221 220
1 1 /* SPDX-License-Identifier: GPL-2.0 */
2 2
3 #ifndef __NOSPEC_BRANCH_H__
4 #define __NOSPEC_BRANCH_H__
3 #ifndef _ASM_X86_NOSPEC_BRANCH_H_
4 #define _ASM_X86_NOSPEC_BRANCH_H_
5 5
6 6 #include <asm/alternative.h>
7 7 #include <asm/alternative-asm.h>
8 8 #include <asm/cpufeatures.h>
9 9
10 /*
11 * Fill the CPU return stack buffer.
12 *
13 * Each entry in the RSB, if used for a speculative 'ret', contains an
14 * infinite 'pause; lfence; jmp' loop to capture speculative execution.
15 *
16 * This is required in various cases for retpoline and IBRS-based
17 * mitigations for the Spectre variant 2 vulnerability. Sometimes to
18 * eliminate potentially bogus entries from the RSB, and sometimes
19 * purely to ensure that it doesn't get empty, which on some CPUs would
20 * allow predictions from other (unwanted!) sources to be used.
21 *
22 * We define a CPP macro such that it can be used from both .S files and
23 * inline assembly. It's possible to do a .macro and then include that
24 * from C via asm(".include <asm/nospec-branch.h>") but let's not go there.
25 */
26
27 #define RSB_CLEAR_LOOPS 32 /* To forcibly overwrite all entries */
28 #define RSB_FILL_LOOPS 16 /* To avoid underflow */
29
30 /*
31 * Google experimented with loop-unrolling and this turned out to be
32 * the optimal version — two calls, each with their own speculation
33 * trap should their return address end up getting used, in a loop.
34 */
35 #define __FILL_RETURN_BUFFER(reg, nr, sp) \
36 mov $(nr/2), reg; \
37 771: \
38 call 772f; \
39 773: /* speculation trap */ \
40 pause; \
41 lfence; \
42 jmp 773b; \
43 772: \
44 call 774f; \
45 775: /* speculation trap */ \
46 pause; \
47 lfence; \
48 jmp 775b; \
49 774: \
50 dec reg; \
51 jnz 771b; \
52 add $(BITS_PER_LONG/8) * nr, sp;
53
54 10 #ifdef __ASSEMBLY__
55 11
56 12 /*
77 77 #endif
78 78 .endm
79 79
80 /*
81 * A simpler FILL_RETURN_BUFFER macro. Don't make people use the CPP
82 * monstrosity above, manually.
83 */
84 .macro FILL_RETURN_BUFFER reg:req nr:req ftr:req
80 /* This clobbers the BX register */
81 .macro FILL_RETURN_BUFFER nr:req ftr:req
85 82 #ifdef CONFIG_RETPOLINE
86 ANNOTATE_NOSPEC_ALTERNATIVE
87 ALTERNATIVE "jmp .Lskip_rsb_\@", \
88 __stringify(__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)) \
89 \ftr
90 .Lskip_rsb_\@:
83 ALTERNATIVE "", "call __clear_rsb", \ftr
91 84 #endif
92 85 .endm
93 86
150 150 * On VMEXIT we must ensure that no RSB predictions learned in the guest
151 151 * can be followed in the host, by overwriting the RSB completely. Both
152 152 * retpoline and IBRS mitigations for Spectre v2 need this; only on future
153 * CPUs with IBRS_ATT *might* it be avoided.
153 * CPUs with IBRS_ALL *might* it be avoided.
154 154 */
155 155 static inline void vmexit_fill_RSB(void)
156 156 {
157 157 #ifdef CONFIG_RETPOLINE
158 unsigned long loops;
159
160 asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE
161 ALTERNATIVE("jmp 910f",
162 __stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)),
163 X86_FEATURE_RETPOLINE)
164 "910:"
165 : "=r" (loops), ASM_CALL_CONSTRAINT
166 : : "memory" );
158 alternative_input("",
159 "call __fill_rsb",
160 X86_FEATURE_RETPOLINE,
161 ASM_NO_INPUT_CLOBBER(_ASM_BX, "memory"));
167 162 #endif
168 163 }
169 164
165 static inline void indirect_branch_prediction_barrier(void)
166 {
167 alternative_input("",
168 "call __ibp_barrier",
169 X86_FEATURE_USE_IBPB,
170 ASM_NO_INPUT_CLOBBER("eax", "ecx", "edx", "memory"));
171 }
172
170 173 #endif /* __ASSEMBLY__ */
171 #endif /* __NOSPEC_BRANCH_H__ */
174 #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
44 44 */
45 45 #define CPU_ENTRY_AREA_PAGES (NR_CPUS * 40)
46 46
47 #define CPU_ENTRY_AREA_BASE \
48 ((FIXADDR_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) & PMD_MASK)
47 #define CPU_ENTRY_AREA_BASE \
48 ((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) \
49 & PMD_MASK)
49 50
50 51 #define PKMAP_BASE \
51 52 ((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
460 460 unsigned short gsindex;
461 461 #endif
462 462
463 u32 status; /* thread synchronous flags */
464
465 463 #ifdef CONFIG_X86_64
466 464 unsigned long fsbase;
467 465 unsigned long gsbase;
969 969
970 970 void stop_this_cpu(void *dummy);
971 971 void df_debug(struct pt_regs *regs, long error_code);
972
973 void __ibp_barrier(void);
974
972 975 #endif /* _ASM_X86_PROCESSOR_H */
106 106 #define REQUIRED_MASK15 0
107 107 #define REQUIRED_MASK16 (NEED_LA57)
108 108 #define REQUIRED_MASK17 0
109 #define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
109 #define REQUIRED_MASK18 0
110 #define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
110 111
111 112 #endif /* _ASM_X86_REQUIRED_FEATURES_H */
60 60 * TS_COMPAT is set for 32-bit syscall entries and then
61 61 * remains set until we return to user mode.
62 62 */
63 if (task->thread.status & (TS_COMPAT|TS_I386_REGS_POKED))
63 if (task->thread_info.status & (TS_COMPAT|TS_I386_REGS_POKED))
64 64 /*
65 65 * Sign-extend the value so (int)-EFOO becomes (long)-EFOO
66 66 * and will match correctly in comparisons.
116 116 unsigned long *args)
117 117 {
118 118 # ifdef CONFIG_IA32_EMULATION
119 if (task->thread.status & TS_COMPAT)
119 if (task->thread_info.status & TS_COMPAT)
120 120 switch (i) {
121 121 case 0:
122 122 if (!n--) break;
177 177 const unsigned long *args)
178 178 {
179 179 # ifdef CONFIG_IA32_EMULATION
180 if (task->thread.status & TS_COMPAT)
180 if (task->thread_info.status & TS_COMPAT)
181 181 switch (i) {
182 182 case 0:
183 183 if (!n--) break;
55 55
56 56 struct thread_info {
57 57 unsigned long flags; /* low level flags */
58 u32 status; /* thread synchronous flags */
58 59 };
59 60
60 61 #define INIT_THREAD_INFO(tsk) \
222 222 #define in_ia32_syscall() true
223 223 #else
224 224 #define in_ia32_syscall() (IS_ENABLED(CONFIG_IA32_EMULATION) && \
225 current->thread.status & TS_COMPAT)
225 current_thread_info()->status & TS_COMPAT)
226 226 #endif
227 227
228 228 /*
174 174 struct mm_struct *loaded_mm;
175 175 u16 loaded_mm_asid;
176 176 u16 next_asid;
177 /* last user mm's ctx id */
178 u64 last_ctx_id;
177 179
178 180 /*
179 181 * We can be in one of several states:
124 124
125 125 #define __uaccess_begin() stac()
126 126 #define __uaccess_end() clac()
127 #define __uaccess_begin_nospec() \
128 ({ \
129 stac(); \
130 barrier_nospec(); \
131 })
127 132
128 133 /*
129 134 * This is a type: either unsigned long, if the argument fits into
450 450 ({ \
451 451 int __gu_err; \
452 452 __inttype(*(ptr)) __gu_val; \
453 __uaccess_begin(); \
453 __uaccess_begin_nospec(); \
454 454 __get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT); \
455 455 __uaccess_end(); \
456 456 (x) = (__force __typeof__(*(ptr)))__gu_val; \
492 492 __uaccess_begin(); \
493 493 barrier();
494 494
495 #define uaccess_try_nospec do { \
496 current->thread.uaccess_err = 0; \
497 __uaccess_begin_nospec(); \
498
495 499 #define uaccess_catch(err) \
496 500 __uaccess_end(); \
497 501 (err) |= (current->thread.uaccess_err ? -EFAULT : 0); \
557 557 * get_user_ex(...);
558 558 * } get_user_catch(err)
559 559 */
560 #define get_user_try uaccess_try
560 #define get_user_try uaccess_try_nospec
561 561 #define get_user_catch(err) uaccess_catch(err)
562 562
563 563 #define get_user_ex(x, ptr) do { \
591 591 __typeof__(ptr) __uval = (uval); \
592 592 __typeof__(*(ptr)) __old = (old); \
593 593 __typeof__(*(ptr)) __new = (new); \
594 __uaccess_begin(); \
594 __uaccess_begin_nospec(); \
595 595 switch (size) { \
596 596 case 1: \
597 597 { \
29 29 switch (n) {
30 30 case 1:
31 31 ret = 0;
32 __uaccess_begin();
32 __uaccess_begin_nospec();
33 33 __get_user_asm_nozero(*(u8 *)to, from, ret,
34 34 "b", "b", "=q", 1);
35 35 __uaccess_end();
36 36 return ret;
37 37 case 2:
38 38 ret = 0;
39 __uaccess_begin();
39 __uaccess_begin_nospec();
40 40 __get_user_asm_nozero(*(u16 *)to, from, ret,
41 41 "w", "w", "=r", 2);
42 42 __uaccess_end();
43 43 return ret;
44 44 case 4:
45 45 ret = 0;
46 __uaccess_begin();
46 __uaccess_begin_nospec();
47 47 __get_user_asm_nozero(*(u32 *)to, from, ret,
48 48 "l", "k", "=r", 4);
49 49 __uaccess_end();
55 55 return copy_user_generic(dst, (__force void *)src, size);
56 56 switch (size) {
57 57 case 1:
58 __uaccess_begin();
58 __uaccess_begin_nospec();
59 59 __get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
60 60 ret, "b", "b", "=q", 1);
61 61 __uaccess_end();
62 62 return ret;
63 63 case 2:
64 __uaccess_begin();
64 __uaccess_begin_nospec();
65 65 __get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
66 66 ret, "w", "w", "=r", 2);
67 67 __uaccess_end();
68 68 return ret;
69 69 case 4:
70 __uaccess_begin();
70 __uaccess_begin_nospec();
71 71 __get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
72 72 ret, "l", "k", "=r", 4);
73 73 __uaccess_end();
74 74 return ret;
75 75 case 8:
76 __uaccess_begin();
76 __uaccess_begin_nospec();
77 77 __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
78 78 ret, "q", "", "=r", 8);
79 79 __uaccess_end();
80 80 return ret;
81 81 case 10:
82 __uaccess_begin();
82 __uaccess_begin_nospec();
83 83 __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
84 84 ret, "q", "", "=r", 10);
85 85 if (likely(!ret))
89 89 __uaccess_end();
90 90 return ret;
91 91 case 16:
92 __uaccess_begin();
92 __uaccess_begin_nospec();
93 93 __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
94 94 ret, "q", "", "=r", 16);
95 95 if (likely(!ret))
46 46 }
47 47 __setup("noreplace-smp", setup_noreplace_smp);
48 48
49 #ifdef CONFIG_PARAVIRT
50 static int __initdata_or_module noreplace_paravirt = 0;
51
52 static int __init setup_noreplace_paravirt(char *str)
53 {
54 noreplace_paravirt = 1;
55 return 1;
56 }
57 __setup("noreplace-paravirt", setup_noreplace_paravirt);
58 #endif
59
60 49 #define DPRINTK(fmt, args...) \
61 50 do { \
62 51 if (debug_alternative) \
287 287 tgt_rip = next_rip + o_dspl;
288 288 n_dspl = tgt_rip - orig_insn;
289 289
290 DPRINTK("target RIP: %p, new_displ: 0x%x", tgt_rip, n_dspl);
290 DPRINTK("target RIP: %px, new_displ: 0x%x", tgt_rip, n_dspl);
291 291
292 292 if (tgt_rip - orig_insn >= 0) {
293 293 if (n_dspl - 2 <= 127)
344 344 add_nops(instr + (a->instrlen - a->padlen), a->padlen);
345 345 local_irq_restore(flags);
346 346
347 DUMP_BYTES(instr, a->instrlen, "%p: [%d:%d) optimized NOPs: ",
347 DUMP_BYTES(instr, a->instrlen, "%px: [%d:%d) optimized NOPs: ",
348 348 instr, a->instrlen - a->padlen, a->padlen);
349 349 }
350 350
365 365 u8 *instr, *replacement;
366 366 u8 insnbuf[MAX_PATCH_LEN];
367 367
368 DPRINTK("alt table %p -> %p", start, end);
368 DPRINTK("alt table %px, -> %px", start, end);
369 369 /*
370 370 * The scan order should be from start to end. A later scanned
371 371 * alternative code can overwrite previously scanned alternative code.
389 389 continue;
390 390 }
391 391
392 DPRINTK("feat: %d*32+%d, old: (%p, len: %d), repl: (%p, len: %d), pad: %d",
392 DPRINTK("feat: %d*32+%d, old: (%px len: %d), repl: (%px, len: %d), pad: %d",
393 393 a->cpuid >> 5,
394 394 a->cpuid & 0x1f,
395 395 instr, a->instrlen,
396 396 replacement, a->replacementlen, a->padlen);
397 397
398 DUMP_BYTES(instr, a->instrlen, "%p: old_insn: ", instr);
399 DUMP_BYTES(replacement, a->replacementlen, "%p: rpl_insn: ", replacement);
398 DUMP_BYTES(instr, a->instrlen, "%px: old_insn: ", instr);
399 DUMP_BYTES(replacement, a->replacementlen, "%px: rpl_insn: ", replacement);
400 400
401 401 memcpy(insnbuf, replacement, a->replacementlen);
402 402 insnbuf_sz = a->replacementlen;
422 422 a->instrlen - a->replacementlen);
423 423 insnbuf_sz += a->instrlen - a->replacementlen;
424 424 }
425 DUMP_BYTES(insnbuf, insnbuf_sz, "%p: final_insn: ", instr);
425 DUMP_BYTES(insnbuf, insnbuf_sz, "%px: final_insn: ", instr);
426 426
427 427 text_poke_early(instr, insnbuf, insnbuf_sz);
428 428 }
587 587 {
588 588 struct paravirt_patch_site *p;
589 589 char insnbuf[MAX_PATCH_LEN];
590
591 if (noreplace_paravirt)
592 return;
593 590
594 591 for (p = start; p < end; p++) {
595 592 unsigned int used;
11 11 #include <linux/init.h>
12 12 #include <linux/utsname.h>
13 13 #include <linux/cpu.h>
14 #include <linux/module.h>
14 15
15 16 #include <asm/nospec-branch.h>
16 17 #include <asm/cmdline.h>
91 91 };
92 92
93 93 #undef pr_fmt
94 #define pr_fmt(fmt) "Spectre V2 mitigation: " fmt
94 #define pr_fmt(fmt) "Spectre V2 : " fmt
95 95
96 96 static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE;
97 97
98 #ifdef RETPOLINE
99 static bool spectre_v2_bad_module;
100
101 bool retpoline_module_ok(bool has_retpoline)
102 {
103 if (spectre_v2_enabled == SPECTRE_V2_NONE || has_retpoline)
104 return true;
105
106 pr_err("System may be vulnerable to spectre v2\n");
107 spectre_v2_bad_module = true;
108 return false;
109 }
110
111 static inline const char *spectre_v2_module_string(void)
112 {
113 return spectre_v2_bad_module ? " - vulnerable module loaded" : "";
114 }
115 #else
116 static inline const char *spectre_v2_module_string(void) { return ""; }
117 #endif
118
98 119 static void __init spec2_print_if_insecure(const char *reason)
99 120 {
100 121 if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
101 pr_info("%s\n", reason);
122 pr_info("%s selected on command line.\n", reason);
102 123 }
103 124
104 125 static void __init spec2_print_if_secure(const char *reason)
105 126 {
106 127 if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
107 pr_info("%s\n", reason);
128 pr_info("%s selected on command line.\n", reason);
108 129 }
109 130
110 131 static inline bool retp_compiler(void)
140 140 return len == arglen && !strncmp(arg, opt, len);
141 141 }
142 142
143 static const struct {
144 const char *option;
145 enum spectre_v2_mitigation_cmd cmd;
146 bool secure;
147 } mitigation_options[] = {
148 { "off", SPECTRE_V2_CMD_NONE, false },
149 { "on", SPECTRE_V2_CMD_FORCE, true },
150 { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false },
151 { "retpoline,amd", SPECTRE_V2_CMD_RETPOLINE_AMD, false },
152 { "retpoline,generic", SPECTRE_V2_CMD_RETPOLINE_GENERIC, false },
153 { "auto", SPECTRE_V2_CMD_AUTO, false },
154 };
155
143 156 static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
144 157 {
145 158 char arg[20];
146 int ret;
159 int ret, i;
160 enum spectre_v2_mitigation_cmd cmd = SPECTRE_V2_CMD_AUTO;
147 161
148 ret = cmdline_find_option(boot_command_line, "spectre_v2", arg,
149 sizeof(arg));
150 if (ret > 0) {
151 if (match_option(arg, ret, "off")) {
152 goto disable;
153 } else if (match_option(arg, ret, "on")) {
154 spec2_print_if_secure("force enabled on command line.");
155 return SPECTRE_V2_CMD_FORCE;
156 } else if (match_option(arg, ret, "retpoline")) {
157 spec2_print_if_insecure("retpoline selected on command line.");
158 return SPECTRE_V2_CMD_RETPOLINE;
159 } else if (match_option(arg, ret, "retpoline,amd")) {
160 if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) {
161 pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n");
162 return SPECTRE_V2_CMD_AUTO;
163 }
164 spec2_print_if_insecure("AMD retpoline selected on command line.");
165 return SPECTRE_V2_CMD_RETPOLINE_AMD;
166 } else if (match_option(arg, ret, "retpoline,generic")) {
167 spec2_print_if_insecure("generic retpoline selected on command line.");
168 return SPECTRE_V2_CMD_RETPOLINE_GENERIC;
169 } else if (match_option(arg, ret, "auto")) {
162 if (cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
163 return SPECTRE_V2_CMD_NONE;
164 else {
165 ret = cmdline_find_option(boot_command_line, "spectre_v2", arg,
166 sizeof(arg));
167 if (ret < 0)
170 168 return SPECTRE_V2_CMD_AUTO;
169
170 for (i = 0; i < ARRAY_SIZE(mitigation_options); i++) {
171 if (!match_option(arg, ret, mitigation_options[i].option))
172 continue;
173 cmd = mitigation_options[i].cmd;
174 break;
171 175 }
176
177 if (i >= ARRAY_SIZE(mitigation_options)) {
178 pr_err("unknown option (%s). Switching to AUTO select\n",
179 mitigation_options[i].option);
180 return SPECTRE_V2_CMD_AUTO;
181 }
172 182 }
173 183
174 if (!cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
184 if ((cmd == SPECTRE_V2_CMD_RETPOLINE ||
185 cmd == SPECTRE_V2_CMD_RETPOLINE_AMD ||
186 cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) &&
187 !IS_ENABLED(CONFIG_RETPOLINE)) {
188 pr_err("%s selected but not compiled in. Switching to AUTO select\n",
189 mitigation_options[i].option);
175 190 return SPECTRE_V2_CMD_AUTO;
176 disable:
177 spec2_print_if_insecure("disabled on command line.");
178 return SPECTRE_V2_CMD_NONE;
191 }
192
193 if (cmd == SPECTRE_V2_CMD_RETPOLINE_AMD &&
194 boot_cpu_data.x86_vendor != X86_VENDOR_AMD) {
195 pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n");
196 return SPECTRE_V2_CMD_AUTO;
197 }
198
199 if (mitigation_options[i].secure)
200 spec2_print_if_secure(mitigation_options[i].option);
201 else
202 spec2_print_if_insecure(mitigation_options[i].option);
203
204 return cmd;
179 205 }
180 206
181 207 /* Check for Skylake-like CPUs (for RSB handling) */
239 239 return;
240 240
241 241 case SPECTRE_V2_CMD_FORCE:
242 /* FALLTRHU */
243 242 case SPECTRE_V2_CMD_AUTO:
244 goto retpoline_auto;
245
243 if (IS_ENABLED(CONFIG_RETPOLINE))
244 goto retpoline_auto;
245 break;
246 246 case SPECTRE_V2_CMD_RETPOLINE_AMD:
247 247 if (IS_ENABLED(CONFIG_RETPOLINE))
248 248 goto retpoline_amd;
297 297 setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
298 298 pr_info("Filling RSB on context switch\n");
299 299 }
300
301 /* Initialize Indirect Branch Prediction Barrier if supported */
302 if (boot_cpu_has(X86_FEATURE_IBPB)) {
303 setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
304 pr_info("Enabling Indirect Branch Prediction Barrier\n");
305 }
300 306 }
301 307
302 308 #undef pr_fmt
323 323 {
324 324 if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
325 325 return sprintf(buf, "Not affected\n");
326 return sprintf(buf, "Vulnerable\n");
326 return sprintf(buf, "Mitigation: __user pointer sanitization\n");
327 327 }
328 328
329 329 ssize_t cpu_show_spectre_v2(struct device *dev,
332 332 if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
333 333 return sprintf(buf, "Not affected\n");
334 334
335 return sprintf(buf, "%s\n", spectre_v2_strings[spectre_v2_enabled]);
335 return sprintf(buf, "%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
336 boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
337 spectre_v2_module_string());
336 338 }
337 339 #endif
340
341 void __ibp_barrier(void)
342 {
343 __wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0);
344 }
345 EXPORT_SYMBOL_GPL(__ibp_barrier);
47 47 #include <asm/pat.h>
48 48 #include <asm/microcode.h>
49 49 #include <asm/microcode_intel.h>
50 #include <asm/intel-family.h>
51 #include <asm/cpu_device_id.h>
50 52
51 53 #ifdef CONFIG_X86_LOCAL_APIC
52 54 #include <asm/uv/uv.h>
750 750 }
751 751 }
752 752
753 static void init_speculation_control(struct cpuinfo_x86 *c)
754 {
755 /*
756 * The Intel SPEC_CTRL CPUID bit implies IBRS and IBPB support,
757 * and they also have a different bit for STIBP support. Also,
758 * a hypervisor might have set the individual AMD bits even on
759 * Intel CPUs, for finer-grained selection of what's available.
760 *
761 * We use the AMD bits in 0x8000_0008 EBX as the generic hardware
762 * features, which are visible in /proc/cpuinfo and used by the
763 * kernel. So set those accordingly from the Intel bits.
764 */
765 if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
766 set_cpu_cap(c, X86_FEATURE_IBRS);
767 set_cpu_cap(c, X86_FEATURE_IBPB);
768 }
769 if (cpu_has(c, X86_FEATURE_INTEL_STIBP))
770 set_cpu_cap(c, X86_FEATURE_STIBP);
771 }
772
753 773 void get_cpu_cap(struct cpuinfo_x86 *c)
754 774 {
755 775 u32 eax, ebx, ecx, edx;
791 791 cpuid_count(0x00000007, 0, &eax, &ebx, &ecx, &edx);
792 792 c->x86_capability[CPUID_7_0_EBX] = ebx;
793 793 c->x86_capability[CPUID_7_ECX] = ecx;
794 c->x86_capability[CPUID_7_EDX] = edx;
794 795 }
795 796
796 797 /* Extended state features: level 0x0000000d */
864 864 c->x86_capability[CPUID_8000_000A_EDX] = cpuid_edx(0x8000000a);
865 865
866 866 init_scattered_cpuid_features(c);
867 init_speculation_control(c);
867 868
868 869 /*
869 870 * Clear/Set all flags overridden by options, after probe.
900 900 #endif
901 901 }
902 902
903 static const __initconst struct x86_cpu_id cpu_no_speculation[] = {
904 { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CEDARVIEW, X86_FEATURE_ANY },
905 { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CLOVERVIEW, X86_FEATURE_ANY },
906 { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_LINCROFT, X86_FEATURE_ANY },
907 { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PENWELL, X86_FEATURE_ANY },
908 { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PINEVIEW, X86_FEATURE_ANY },
909 { X86_VENDOR_CENTAUR, 5 },
910 { X86_VENDOR_INTEL, 5 },
911 { X86_VENDOR_NSC, 5 },
912 { X86_VENDOR_ANY, 4 },
913 {}
914 };
915
916 static const __initconst struct x86_cpu_id cpu_no_meltdown[] = {
917 { X86_VENDOR_AMD },
918 {}
919 };
920
921 static bool __init cpu_vulnerable_to_meltdown(struct cpuinfo_x86 *c)
922 {
923 u64 ia32_cap = 0;
924
925 if (x86_match_cpu(cpu_no_meltdown))
926 return false;
927
928 if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES))
929 rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
930
931 /* Rogue Data Cache Load? No! */
932 if (ia32_cap & ARCH_CAP_RDCL_NO)
933 return false;
934
935 return true;
936 }
937
903 938 /*
904 939 * Do minimum CPU detection early.
905 940 * Fields really needed: vendor, cpuid_level, family, model, mask,
982 982
983 983 setup_force_cpu_cap(X86_FEATURE_ALWAYS);
984 984
985 if (c->x86_vendor != X86_VENDOR_AMD)
986 setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
987
988 setup_force_cpu_bug(X86_BUG_SPECTRE_V1);
989 setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
985 if (!x86_match_cpu(cpu_no_speculation)) {
986 if (cpu_vulnerable_to_meltdown(c))
987 setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
988 setup_force_cpu_bug(X86_BUG_SPECTRE_V1);
989 setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
990 }
990 991
991 992 fpu__init_system(c);
992 993
102 102 ELF_HWCAP2 |= HWCAP2_RING3MWAIT;
103 103 }
104 104
105 /*
106 * Early microcode releases for the Spectre v2 mitigation were broken.
107 * Information taken from;
108 * - https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/microcode-update-guidance.pdf
109 * - https://kb.vmware.com/s/article/52345
110 * - Microcode revisions observed in the wild
111 * - Release note from 20180108 microcode release
112 */
113 struct sku_microcode {
114 u8 model;
115 u8 stepping;
116 u32 microcode;
117 };
118 static const struct sku_microcode spectre_bad_microcodes[] = {
119 { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0B, 0x84 },
120 { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0A, 0x84 },
121 { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x84 },
122 { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x84 },
123 { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x84 },
124 { INTEL_FAM6_SKYLAKE_X, 0x03, 0x0100013e },
125 { INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003c },
126 { INTEL_FAM6_SKYLAKE_MOBILE, 0x03, 0xc2 },
127 { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0xc2 },
128 { INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 },
129 { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x1b },
130 { INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 },
131 { INTEL_FAM6_BROADWELL_XEON_D, 0x03, 0x07000011 },
132 { INTEL_FAM6_BROADWELL_X, 0x01, 0x0b000025 },
133 { INTEL_FAM6_HASWELL_ULT, 0x01, 0x21 },
134 { INTEL_FAM6_HASWELL_GT3E, 0x01, 0x18 },
135 { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 },
136 { INTEL_FAM6_HASWELL_X, 0x02, 0x3b },
137 { INTEL_FAM6_HASWELL_X, 0x04, 0x10 },
138 { INTEL_FAM6_IVYBRIDGE_X, 0x04, 0x42a },
139 /* Updated in the 20180108 release; blacklist until we know otherwise */
140 { INTEL_FAM6_ATOM_GEMINI_LAKE, 0x01, 0x22 },
141 /* Observed in the wild */
142 { INTEL_FAM6_SANDYBRIDGE_X, 0x06, 0x61b },
143 { INTEL_FAM6_SANDYBRIDGE_X, 0x07, 0x712 },
144 };
145
146 static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
147 {
148 int i;
149
150 for (i = 0; i < ARRAY_SIZE(spectre_bad_microcodes); i++) {
151 if (c->x86_model == spectre_bad_microcodes[i].model &&
152 c->x86_mask == spectre_bad_microcodes[i].stepping)
153 return (c->microcode <= spectre_bad_microcodes[i].microcode);
154 }
155 return false;
156 }
157
105 158 static void early_init_intel(struct cpuinfo_x86 *c)
106 159 {
107 160 u64 misc_enable;
174 174
175 175 if (c->x86 >= 6 && !cpu_has(c, X86_FEATURE_IA64))
176 176 c->microcode = intel_get_microcode_revision();
177
178 /* Now if any of them are set, check the blacklist and clear the lot */
179 if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) ||
180 cpu_has(c, X86_FEATURE_INTEL_STIBP) ||
181 cpu_has(c, X86_FEATURE_IBRS) || cpu_has(c, X86_FEATURE_IBPB) ||
182 cpu_has(c, X86_FEATURE_STIBP)) && bad_spectre_microcode(c)) {
183 pr_warn("Intel Spectre v2 broken microcode detected; disabling Speculation Control\n");
184 setup_clear_cpu_cap(X86_FEATURE_IBRS);
185 setup_clear_cpu_cap(X86_FEATURE_IBPB);
186 setup_clear_cpu_cap(X86_FEATURE_STIBP);
187 setup_clear_cpu_cap(X86_FEATURE_SPEC_CTRL);
188 setup_clear_cpu_cap(X86_FEATURE_INTEL_STIBP);
189 }
177 190
178 191 /*
179 192 * Atom erratum AAE44/AAF40/AAG38/AAH41:
21 21 static const struct cpuid_bit cpuid_bits[] = {
22 22 { X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
23 23 { X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
24 { X86_FEATURE_AVX512_4VNNIW, CPUID_EDX, 2, 0x00000007, 0 },
25 { X86_FEATURE_AVX512_4FMAPS, CPUID_EDX, 3, 0x00000007, 0 },
26 24 { X86_FEATURE_CAT_L3, CPUID_EBX, 1, 0x00000010, 0 },
27 25 { X86_FEATURE_CAT_L2, CPUID_EBX, 2, 0x00000010, 0 },
28 26 { X86_FEATURE_CDP_L3, CPUID_ECX, 2, 0x00000010, 1 },
557 557 * Pretend to come from a x32 execve.
558 558 */
559 559 task_pt_regs(current)->orig_ax = __NR_x32_execve | __X32_SYSCALL_BIT;
560 current->thread.status &= ~TS_COMPAT;
560 current_thread_info()->status &= ~TS_COMPAT;
561 561 #endif
562 562 }
563 563
571 571 current->personality |= force_personality32;
572 572 /* Prepare the first "return" to user space */
573 573 task_pt_regs(current)->orig_ax = __NR_ia32_execve;
574 current->thread.status |= TS_COMPAT;
574 current_thread_info()->status |= TS_COMPAT;
575 575 #endif
576 576 }
577 577
935 935 */
936 936 regs->orig_ax = value;
937 937 if (syscall_get_nr(child, regs) >= 0)
938 child->thread.status |= TS_I386_REGS_POKED;
938 child->thread_info.status |= TS_I386_REGS_POKED;
939 939 break;
940 940
941 941 case offsetof(struct user32, regs.eflags):
787 787 * than the tracee.
788 788 */
789 789 #ifdef CONFIG_IA32_EMULATION
790 if (current->thread.status & (TS_COMPAT|TS_I386_REGS_POKED))
790 if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED))
791 791 return __NR_ia32_restart_syscall;
792 792 #endif
793 793 #ifdef CONFIG_X86_X32_ABI
67 67
68 68 #define F(x) bit(X86_FEATURE_##x)
69 69
70 /* These are scattered features in cpufeatures.h. */
71 #define KVM_CPUID_BIT_AVX512_4VNNIW 2
72 #define KVM_CPUID_BIT_AVX512_4FMAPS 3
70 /* For scattered features from cpufeatures.h; we currently expose none */
73 71 #define KF(x) bit(KVM_CPUID_BIT_##x)
74 72
75 73 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
365 365 F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
366 366 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
367 367
368 /* cpuid 0x80000008.ebx */
369 const u32 kvm_cpuid_8000_0008_ebx_x86_features =
370 F(IBPB) | F(IBRS);
371
368 372 /* cpuid 0xC0000001.edx */
369 373 const u32 kvm_cpuid_C000_0001_edx_x86_features =
370 374 F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
394 394
395 395 /* cpuid 7.0.edx*/
396 396 const u32 kvm_cpuid_7_0_edx_x86_features =
397 KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
397 F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
398 F(ARCH_CAPABILITIES);
398 399
399 400 /* all calls to cpuid_count() should be made on the same cpu */
400 401 get_cpu();
480 480 if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
481 481 entry->ecx &= ~F(PKU);
482 482 entry->edx &= kvm_cpuid_7_0_edx_x86_features;
483 entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
483 cpuid_mask(&entry->edx, CPUID_7_EDX);
484 484 } else {
485 485 entry->ebx = 0;
486 486 entry->ecx = 0;
630 630 if (!g_phys_as)
631 631 g_phys_as = phys_as;
632 632 entry->eax = g_phys_as | (virt_as << 8);
633 entry->ebx = entry->edx = 0;
633 entry->edx = 0;
634 /* IBRS and IBPB aren't necessarily present in hardware cpuid */
635 if (boot_cpu_has(X86_FEATURE_IBPB))
636 entry->ebx |= F(IBPB);
637 if (boot_cpu_has(X86_FEATURE_IBRS))
638 entry->ebx |= F(IBRS);
639 entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
640 cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
634 641 break;
635 642 }
636 643 case 0x80000019:
54 54 [CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
55 55 [CPUID_7_ECX] = { 7, 0, CPUID_ECX},
56 56 [CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
57 [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
57 58 };
58 59
59 60 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
25 25 #include <asm/kvm_emulate.h>
26 26 #include <linux/stringify.h>
27 27 #include <asm/debugreg.h>
28 #include <asm/nospec-branch.h>
28 29
29 30 #include "x86.h"
30 31 #include "tss.h"
1022 1022 void (*fop)(void) = (void *)em_setcc + 4 * (condition & 0xf);
1023 1023
1024 1024 flags = (flags & EFLAGS_MASK) | X86_EFLAGS_IF;
1025 asm("push %[flags]; popf; call *%[fastop]"
1026 : "=a"(rc) : [fastop]"r"(fop), [flags]"r"(flags));
1025 asm("push %[flags]; popf; " CALL_NOSPEC
1026 : "=a"(rc) : [thunk_target]"r"(fop), [flags]"r"(flags));
1027 1027 return rc;
1028 1028 }
1029 1029
5336 5336 if (!(ctxt->d & ByteOp))
5337 5337 fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE;
5338 5338
5339 asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
5339 asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
5340 5340 : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
5341 [fastop]"+S"(fop), ASM_CALL_CONSTRAINT
5341 [thunk_target]"+S"(fop), ASM_CALL_CONSTRAINT
5342 5342 : "c"(ctxt->src2.val));
5343 5343
5344 5344 ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
184 184 u64 gs_base;
185 185 } host;
186 186
187 u64 spec_ctrl;
188
187 189 u32 *msrpm;
188 190
189 191 ulong nmi_iret_rip;
251 251 { .index = MSR_CSTAR, .always = true },
252 252 { .index = MSR_SYSCALL_MASK, .always = true },
253 253 #endif
254 { .index = MSR_IA32_SPEC_CTRL, .always = false },
255 { .index = MSR_IA32_PRED_CMD, .always = false },
254 256 { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false },
255 257 { .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
256 258 { .index = MSR_IA32_LASTINTFROMIP, .always = false },
533 533 struct kvm_ldttss_desc *tss_desc;
534 534
535 535 struct page *save_area;
536 struct vmcb *current_vmcb;
536 537 };
537 538
538 539 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
885 885 return false;
886 886 }
887 887
888 static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr)
889 {
890 u8 bit_write;
891 unsigned long tmp;
892 u32 offset;
893 u32 *msrpm;
894
895 msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
896 to_svm(vcpu)->msrpm;
897
898 offset = svm_msrpm_offset(msr);
899 bit_write = 2 * (msr & 0x0f) + 1;
900 tmp = msrpm[offset];
901
902 BUG_ON(offset == MSR_INVALID);
903
904 return !!test_bit(bit_write, &tmp);
905 }
906
888 907 static void set_msr_interception(u32 *msrpm, unsigned msr,
889 908 int read, int write)
890 909 {
1606 1606 u32 dummy;
1607 1607 u32 eax = 1;
1608 1608
1609 svm->spec_ctrl = 0;
1610
1609 1611 if (!init_event) {
1610 1612 svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
1611 1613 MSR_IA32_APICBASE_ENABLE;
1729 1729 __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
1730 1730 kvm_vcpu_uninit(vcpu);
1731 1731 kmem_cache_free(kvm_vcpu_cache, svm);
1732 /*
1733 * The vmcb page can be recycled, causing a false negative in
1734 * svm_vcpu_load(). So do a full IBPB now.
1735 */
1736 indirect_branch_prediction_barrier();
1732 1737 }
1733 1738
1734 1739 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
1735 1740 {
1736 1741 struct vcpu_svm *svm = to_svm(vcpu);
1742 struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
1737 1743 int i;
1738 1744
1739 1745 if (unlikely(cpu != vcpu->cpu)) {
1768 1768 if (static_cpu_has(X86_FEATURE_RDTSCP))
1769 1769 wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
1770 1770
1771 if (sd->current_vmcb != svm->vmcb) {
1772 sd->current_vmcb = svm->vmcb;
1773 indirect_branch_prediction_barrier();
1774 }
1771 1775 avic_vcpu_load(vcpu, cpu);
1772 1776 }
1773 1777
3629 3629 case MSR_VM_CR:
3630 3630 msr_info->data = svm->nested.vm_cr_msr;
3631 3631 break;
3632 case MSR_IA32_SPEC_CTRL:
3633 if (!msr_info->host_initiated &&
3634 !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
3635 return 1;
3636
3637 msr_info->data = svm->spec_ctrl;
3638 break;
3632 3639 case MSR_IA32_UCODE_REV:
3633 3640 msr_info->data = 0x01000065;
3634 3641 break;
3727 3727 case MSR_IA32_TSC:
3728 3728 kvm_write_tsc(vcpu, msr);
3729 3729 break;
3730 case MSR_IA32_SPEC_CTRL:
3731 if (!msr->host_initiated &&
3732 !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
3733 return 1;
3734
3735 /* The STIBP bit doesn't fault even if it's not advertised */
3736 if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
3737 return 1;
3738
3739 svm->spec_ctrl = data;
3740
3741 if (!data)
3742 break;
3743
3744 /*
3745 * For non-nested:
3746 * When it's written (to non-zero) for the first time, pass
3747 * it through.
3748 *
3749 * For nested:
3750 * The handling of the MSR bitmap for L2 guests is done in
3751 * nested_svm_vmrun_msrpm.
3752 * We update the L1 MSR bit as well since it will end up
3753 * touching the MSR anyway now.
3754 */
3755 set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
3756 break;
3757 case MSR_IA32_PRED_CMD:
3758 if (!msr->host_initiated &&
3759 !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
3760 return 1;
3761
3762 if (data & ~PRED_CMD_IBPB)
3763 return 1;
3764
3765 if (!data)
3766 break;
3767
3768 wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
3769 if (is_guest_mode(vcpu))
3770 break;
3771 set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
3772 break;
3730 3773 case MSR_STAR:
3731 3774 svm->vmcb->save.star = data;
3732 3775 break;
5022 5022
5023 5023 local_irq_enable();
5024 5024
5025 /*
5026 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
5027 * it's non-zero. Since vmentry is serialising on affected CPUs, there
5028 * is no need to worry about the conditional branch over the wrmsr
5029 * being speculatively taken.
5030 */
5031 if (svm->spec_ctrl)
5032 wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
5033
5025 5034 asm volatile (
5026 5035 "push %%" _ASM_BP "; \n\t"
5027 5036 "mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
5122 5122 , "ebx", "ecx", "edx", "esi", "edi"
5123 5123 #endif
5124 5124 );
5125
5126 /*
5127 * We do not use IBRS in the kernel. If this vCPU has used the
5128 * SPEC_CTRL MSR it may have left it on; save the value and
5129 * turn it off. This is much more efficient than blindly adding
5130 * it to the atomic save/restore list. Especially as the former
5131 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
5132 *
5133 * For non-nested case:
5134 * If the L01 MSR bitmap does not intercept the MSR, then we need to
5135 * save it.
5136 *
5137 * For nested case:
5138 * If the L02 MSR bitmap does not intercept the MSR, then we need to
5139 * save it.
5140 */
5141 if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
5142 rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
5143
5144 if (svm->spec_ctrl)
5145 wrmsrl(MSR_IA32_SPEC_CTRL, 0);
5125 5146
5126 5147 /* Eliminate branch target predictions from guest mode */
5127 5148 vmexit_fill_RSB();
34 34 #include <linux/tboot.h>
35 35 #include <linux/hrtimer.h>
36 36 #include <linux/frame.h>
37 #include <linux/nospec.h>
37 38 #include "kvm_cache_regs.h"
38 39 #include "x86.h"
39 40
112 112 static bool __read_mostly enable_pml = 1;
113 113 module_param_named(pml, enable_pml, bool, S_IRUGO);
114 114
115 #define MSR_TYPE_R 1
116 #define MSR_TYPE_W 2
117 #define MSR_TYPE_RW 3
118
119 #define MSR_BITMAP_MODE_X2APIC 1
120 #define MSR_BITMAP_MODE_X2APIC_APICV 2
121 #define MSR_BITMAP_MODE_LM 4
122
115 123 #define KVM_VMX_TSC_MULTIPLIER_MAX 0xffffffffffffffffULL
116 124
117 125 /* Guest_tsc -> host_tsc conversion requires 64-bit division. */
194 194 extern const ulong vmx_return;
195 195
196 196 #define NR_AUTOLOAD_MSRS 8
197 #define VMCS02_POOL_SIZE 1
198 197
199 198 struct vmcs {
200 199 u32 revision_id;
218 218 int soft_vnmi_blocked;
219 219 ktime_t entry_time;
220 220 s64 vnmi_blocked_time;
221 unsigned long *msr_bitmap;
221 222 struct list_head loaded_vmcss_on_cpu_link;
222 223 };
223 224
235 235 * stored in guest memory specified by VMPTRLD, but is opaque to the guest,
236 236 * which must access it using VMREAD/VMWRITE/VMCLEAR instructions.
237 237 * More than one of these structures may exist, if L1 runs multiple L2 guests.
238 * nested_vmx_run() will use the data here to build a vmcs02: a VMCS for the
238 * nested_vmx_run() will use the data here to build the vmcs02: a VMCS for the
239 239 * underlying hardware which will be used to run L2.
240 240 * This structure is packed to ensure that its layout is identical across
241 241 * machines (necessary for live migration).
418 418 */
419 419 #define VMCS12_SIZE 0x1000
420 420
421 /* Used to remember the last vmcs02 used for some recently used vmcs12s */
422 struct vmcs02_list {
423 struct list_head list;
424 gpa_t vmptr;
425 struct loaded_vmcs vmcs02;
426 };
427
428 421 /*
429 422 * The nested_vmx structure is part of vcpu_vmx, and holds information we need
430 423 * for correct emulation of VMX (i.e., nested VMX) on this vcpu.
442 442 */
443 443 bool sync_shadow_vmcs;
444 444
445 /* vmcs02_list cache of VMCSs recently used to run L2 guests */
446 struct list_head vmcs02_pool;
447 int vmcs02_num;
448 445 bool change_vmcs01_virtual_x2apic_mode;
449 446 /* L2 must run next, and mustn't decide to exit to L1. */
450 447 bool nested_run_pending;
448
449 struct loaded_vmcs vmcs02;
450
451 451 /*
452 * Guest pages referred to in vmcs02 with host-physical pointers, so
453 * we must keep them pinned while L2 runs.
452 * Guest pages referred to in the vmcs02 with host-physical
453 * pointers, so we must keep them pinned while L2 runs.
454 454 */
455 455 struct page *apic_access_page;
456 456 struct page *virtual_apic_page;
459 459 bool pi_pending;
460 460 u16 posted_intr_nv;
461 461
462 unsigned long *msr_bitmap;
463
464 462 struct hrtimer preemption_timer;
465 463 bool preemption_timer_expired;
466 464
581 581 struct kvm_vcpu vcpu;
582 582 unsigned long host_rsp;
583 583 u8 fail;
584 u8 msr_bitmap_mode;
584 585 u32 exit_intr_info;
585 586 u32 idt_vectoring_info;
586 587 ulong rflags;
593 593 u64 msr_host_kernel_gs_base;
594 594 u64 msr_guest_kernel_gs_base;
595 595 #endif
596
597 u64 arch_capabilities;
598 u64 spec_ctrl;
599
596 600 u32 vm_entry_controls_shadow;
597 601 u32 vm_exit_controls_shadow;
598 602 u32 secondary_exec_control;
903 903
904 904 static inline short vmcs_field_to_offset(unsigned long field)
905 905 {
906 BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
906 const size_t size = ARRAY_SIZE(vmcs_field_to_offset_table);
907 unsigned short offset;
907 908
908 if (field >= ARRAY_SIZE(vmcs_field_to_offset_table))
909 BUILD_BUG_ON(size > SHRT_MAX);
910 if (field >= size)
909 911 return -ENOENT;
910 912
911 /*
912 * FIXME: Mitigation for CVE-2017-5753. To be replaced with a
913 * generic mechanism.
914 */
915 asm("lfence");
916
917 if (vmcs_field_to_offset_table[field] == 0)
913 field = array_index_nospec(field, size);
914 offset = vmcs_field_to_offset_table[field];
915 if (offset == 0)
918 916 return -ENOENT;
919
920 return vmcs_field_to_offset_table[field];
917 return offset;
921 918 }
922 919
923 920 static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)
937 937 static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
938 938 static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
939 939 u16 error_code);
940 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
941 static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
942 u32 msr, int type);
940 943
941 944 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
942 945 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
959 959 enum {
960 960 VMX_IO_BITMAP_A,
961 961 VMX_IO_BITMAP_B,
962 VMX_MSR_BITMAP_LEGACY,
963 VMX_MSR_BITMAP_LONGMODE,
964 VMX_MSR_BITMAP_LEGACY_X2APIC_APICV,
965 VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV,
966 VMX_MSR_BITMAP_LEGACY_X2APIC,
967 VMX_MSR_BITMAP_LONGMODE_X2APIC,
968 962 VMX_VMREAD_BITMAP,
969 963 VMX_VMWRITE_BITMAP,
970 964 VMX_BITMAP_NR
968 968
969 969 #define vmx_io_bitmap_a (vmx_bitmap[VMX_IO_BITMAP_A])
970 970 #define vmx_io_bitmap_b (vmx_bitmap[VMX_IO_BITMAP_B])
971 #define vmx_msr_bitmap_legacy (vmx_bitmap[VMX_MSR_BITMAP_LEGACY])
972 #define vmx_msr_bitmap_longmode (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE])
973 #define vmx_msr_bitmap_legacy_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC_APICV])
974 #define vmx_msr_bitmap_longmode_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV])
975 #define vmx_msr_bitmap_legacy_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC])
976 #define vmx_msr_bitmap_longmode_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC])
977 971 #define vmx_vmread_bitmap (vmx_bitmap[VMX_VMREAD_BITMAP])
978 972 #define vmx_vmwrite_bitmap (vmx_bitmap[VMX_VMWRITE_BITMAP])
979 973
1911 1911 vmcs_write32(EXCEPTION_BITMAP, eb);
1912 1912 }
1913 1913
1914 /*
1915 * Check if MSR is intercepted for currently loaded MSR bitmap.
1916 */
1917 static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
1918 {
1919 unsigned long *msr_bitmap;
1920 int f = sizeof(unsigned long);
1921
1922 if (!cpu_has_vmx_msr_bitmap())
1923 return true;
1924
1925 msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
1926
1927 if (msr <= 0x1fff) {
1928 return !!test_bit(msr, msr_bitmap + 0x800 / f);
1929 } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
1930 msr &= 0x1fff;
1931 return !!test_bit(msr, msr_bitmap + 0xc00 / f);
1932 }
1933
1934 return true;
1935 }
1936
1937 /*
1938 * Check if MSR is intercepted for L01 MSR bitmap.
1939 */
1940 static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
1941 {
1942 unsigned long *msr_bitmap;
1943 int f = sizeof(unsigned long);
1944
1945 if (!cpu_has_vmx_msr_bitmap())
1946 return true;
1947
1948 msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
1949
1950 if (msr <= 0x1fff) {
1951 return !!test_bit(msr, msr_bitmap + 0x800 / f);
1952 } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
1953 msr &= 0x1fff;
1954 return !!test_bit(msr, msr_bitmap + 0xc00 / f);
1955 }
1956
1957 return true;
1958 }
1959
1914 1960 static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
1915 1961 unsigned long entry, unsigned long exit)
1916 1962 {
2335 2335 if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
2336 2336 per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
2337 2337 vmcs_load(vmx->loaded_vmcs->vmcs);
2338 indirect_branch_prediction_barrier();
2338 2339 }
2339 2340
2340 2341 if (!already_loaded) {
2612 2612 vmx->guest_msrs[from] = tmp;
2613 2613 }
2614 2614
2615 static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
2616 {
2617 unsigned long *msr_bitmap;
2618
2619 if (is_guest_mode(vcpu))
2620 msr_bitmap = to_vmx(vcpu)->nested.msr_bitmap;
2621 else if (cpu_has_secondary_exec_ctrls() &&
2622 (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) &
2623 SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) {
2624 if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) {
2625 if (is_long_mode(vcpu))
2626 msr_bitmap = vmx_msr_bitmap_longmode_x2apic_apicv;
2627 else
2628 msr_bitmap = vmx_msr_bitmap_legacy_x2apic_apicv;
2629 } else {
2630 if (is_long_mode(vcpu))
2631 msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
2632 else
2633 msr_bitmap = vmx_msr_bitmap_legacy_x2apic;
2634 }
2635 } else {
2636 if (is_long_mode(vcpu))
2637 msr_bitmap = vmx_msr_bitmap_longmode;
2638 else
2639 msr_bitmap = vmx_msr_bitmap_legacy;
2640 }
2641
2642 vmcs_write64(MSR_BITMAP, __pa(msr_bitmap));
2643 }
2644
2645 2615 /*
2646 2616 * Set up the vmcs to automatically save and restore system
2647 2617 * msrs. Don't touch the 64-bit msrs if the guest is in legacy
2652 2652 vmx->save_nmsrs = save_nmsrs;
2653 2653
2654 2654 if (cpu_has_vmx_msr_bitmap())
2655 vmx_set_msr_bitmap(&vmx->vcpu);
2655 vmx_update_msr_bitmap(&vmx->vcpu);
2656 2656 }
2657 2657
2658 2658 /*
3286 3286 case MSR_IA32_TSC:
3287 3287 msr_info->data = guest_read_tsc(vcpu);
3288 3288 break;
3289 case MSR_IA32_SPEC_CTRL:
3290 if (!msr_info->host_initiated &&
3291 !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
3292 !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
3293 return 1;
3294
3295 msr_info->data = to_vmx(vcpu)->spec_ctrl;
3296 break;
3297 case MSR_IA32_ARCH_CAPABILITIES:
3298 if (!msr_info->host_initiated &&
3299 !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
3300 return 1;
3301 msr_info->data = to_vmx(vcpu)->arch_capabilities;
3302 break;
3289 3303 case MSR_IA32_SYSENTER_CS:
3290 3304 msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
3291 3305 break;
3407 3407 case MSR_IA32_TSC:
3408 3408 kvm_write_tsc(vcpu, msr_info);
3409 3409 break;
3410 case MSR_IA32_SPEC_CTRL:
3411 if (!msr_info->host_initiated &&
3412 !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
3413 !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
3414 return 1;
3415
3416 /* The STIBP bit doesn't fault even if it's not advertised */
3417 if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
3418 return 1;
3419
3420 vmx->spec_ctrl = data;
3421
3422 if (!data)
3423 break;
3424
3425 /*
3426 * For non-nested:
3427 * When it's written (to non-zero) for the first time, pass
3428 * it through.
3429 *
3430 * For nested:
3431 * The handling of the MSR bitmap for L2 guests is done in
3432 * nested_vmx_merge_msr_bitmap. We should not touch the
3433 * vmcs02.msr_bitmap here since it gets completely overwritten
3434 * in the merging. We update the vmcs01 here for L1 as well
3435 * since it will end up touching the MSR anyway now.
3436 */
3437 vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
3438 MSR_IA32_SPEC_CTRL,
3439 MSR_TYPE_RW);
3440 break;
3441 case MSR_IA32_PRED_CMD:
3442 if (!msr_info->host_initiated &&
3443 !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
3444 !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
3445 return 1;
3446
3447 if (data & ~PRED_CMD_IBPB)
3448 return 1;
3449
3450 if (!data)
3451 break;
3452
3453 wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
3454
3455 /*
3456 * For non-nested:
3457 * When it's written (to non-zero) for the first time, pass
3458 * it through.
3459 *
3460 * For nested:
3461 * The handling of the MSR bitmap for L2 guests is done in
3462 * nested_vmx_merge_msr_bitmap. We should not touch the
3463 * vmcs02.msr_bitmap here since it gets completely overwritten
3464 * in the merging.
3465 */
3466 vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
3467 MSR_TYPE_W);
3468 break;
3469 case MSR_IA32_ARCH_CAPABILITIES:
3470 if (!msr_info->host_initiated)
3471 return 1;
3472 vmx->arch_capabilities = data;
3473 break;
3410 3474 case MSR_IA32_CR_PAT:
3411 3475 if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
3412 3476 if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
3925 3925 return vmcs;
3926 3926 }
3927 3927
3928 static struct vmcs *alloc_vmcs(void)
3929 {
3930 return alloc_vmcs_cpu(raw_smp_processor_id());
3931 }
3932
3933 3928 static void free_vmcs(struct vmcs *vmcs)
3934 3929 {
3935 3930 free_pages((unsigned long)vmcs, vmcs_config.order);
3940 3940 loaded_vmcs_clear(loaded_vmcs);
3941 3941 free_vmcs(loaded_vmcs->vmcs);
3942 3942 loaded_vmcs->vmcs = NULL;
3943 if (loaded_vmcs->msr_bitmap)
3944 free_page((unsigned long)loaded_vmcs->msr_bitmap);
3943 3945 WARN_ON(loaded_vmcs->shadow_vmcs != NULL);
3944 3946 }
3945 3947
3948 static struct vmcs *alloc_vmcs(void)
3949 {
3950 return alloc_vmcs_cpu(raw_smp_processor_id());
3951 }
3952
3953 static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
3954 {
3955 loaded_vmcs->vmcs = alloc_vmcs();
3956 if (!loaded_vmcs->vmcs)
3957 return -ENOMEM;
3958
3959 loaded_vmcs->shadow_vmcs = NULL;
3960 loaded_vmcs_init(loaded_vmcs);
3961
3962 if (cpu_has_vmx_msr_bitmap()) {
3963 loaded_vmcs->msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
3964 if (!loaded_vmcs->msr_bitmap)
3965 goto out_vmcs;
3966 memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE);
3967 }
3968 return 0;
3969
3970 out_vmcs:
3971 free_loaded_vmcs(loaded_vmcs);
3972 return -ENOMEM;
3973 }
3974
3946 3975 static void free_kvm_area(void)
3947 3976 {
3948 3977 int cpu;
5030 5030 spin_unlock(&vmx_vpid_lock);
5031 5031 }
5032 5032
5033 #define MSR_TYPE_R 1
5034 #define MSR_TYPE_W 2
5035 static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
5036 u32 msr, int type)
5033 static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
5034 u32 msr, int type)
5037 5035 {
5038 5036 int f = sizeof(unsigned long);
5039 5037
5065 5065 }
5066 5066 }
5067 5067
5068 static void __always_inline vmx_enable_intercept_for_msr(unsigned long *msr_bitmap,
5069 u32 msr, int type)
5070 {
5071 int f = sizeof(unsigned long);
5072
5073 if (!cpu_has_vmx_msr_bitmap())
5074 return;
5075
5076 /*
5077 * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
5078 * have the write-low and read-high bitmap offsets the wrong way round.
5079 * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
5080 */
5081 if (msr <= 0x1fff) {
5082 if (type & MSR_TYPE_R)
5083 /* read-low */
5084 __set_bit(msr, msr_bitmap + 0x000 / f);
5085
5086 if (type & MSR_TYPE_W)
5087 /* write-low */
5088 __set_bit(msr, msr_bitmap + 0x800 / f);
5089
5090 } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
5091 msr &= 0x1fff;
5092 if (type & MSR_TYPE_R)
5093 /* read-high */
5094 __set_bit(msr, msr_bitmap + 0x400 / f);
5095
5096 if (type & MSR_TYPE_W)
5097 /* write-high */
5098 __set_bit(msr, msr_bitmap + 0xc00 / f);
5099
5100 }
5101 }
5102
5103 static void __always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
5104 u32 msr, int type, bool value)
5105 {
5106 if (value)
5107 vmx_enable_intercept_for_msr(msr_bitmap, msr, type);
5108 else
5109 vmx_disable_intercept_for_msr(msr_bitmap, msr, type);
5110 }
5111
5068 5112 /*
5069 5113 * If a msr is allowed by L0, we should check whether it is allowed by L1.
5070 5114 * The corresponding bit will be cleared unless both of L0 and L1 allow it.
5155 5155 }
5156 5156 }
5157 5157
5158 static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
5158 static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu)
5159 5159 {
5160 if (!longmode_only)
5161 __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy,
5162 msr, MSR_TYPE_R | MSR_TYPE_W);
5163 __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
5164 msr, MSR_TYPE_R | MSR_TYPE_W);
5160 u8 mode = 0;
5161
5162 if (cpu_has_secondary_exec_ctrls() &&
5163 (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) &
5164 SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) {
5165 mode |= MSR_BITMAP_MODE_X2APIC;
5166 if (enable_apicv && kvm_vcpu_apicv_active(vcpu))
5167 mode |= MSR_BITMAP_MODE_X2APIC_APICV;
5168 }
5169
5170 if (is_long_mode(vcpu))
5171 mode |= MSR_BITMAP_MODE_LM;
5172
5173 return mode;
5165 5174 }
5166 5175
5167 static void vmx_disable_intercept_msr_x2apic(u32 msr, int type, bool apicv_active)
5176 #define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4))
5177
5178 static void vmx_update_msr_bitmap_x2apic(unsigned long *msr_bitmap,
5179 u8 mode)
5168 5180 {
5169 if (apicv_active) {
5170 __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic_apicv,
5171 msr, type);
5172 __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic_apicv,
5173 msr, type);
5174 } else {
5175 __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
5176 msr, type);
5177 __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
5178 msr, type);
5181 int msr;
5182
5183 for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) {
5184 unsigned word = msr / BITS_PER_LONG;
5185 msr_bitmap[word] = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0;
5186 msr_bitmap[word + (0x800 / sizeof(long))] = ~0;
5179 5187 }
5188
5189 if (mode & MSR_BITMAP_MODE_X2APIC) {
5190 /*
5191 * TPR reads and writes can be virtualized even if virtual interrupt
5192 * delivery is not in use.
5193 */
5194 vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW);
5195 if (mode & MSR_BITMAP_MODE_X2APIC_APICV) {
5196 vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
5197 vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
5198 vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
5199 }
5200 }
5180 5201 }
5181 5202
5203 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu)
5204 {
5205 struct vcpu_vmx *vmx = to_vmx(vcpu);
5206 unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
5207 u8 mode = vmx_msr_bitmap_mode(vcpu);
5208 u8 changed = mode ^ vmx->msr_bitmap_mode;
5209
5210 if (!changed)
5211 return;
5212
5213 vmx_set_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW,
5214 !(mode & MSR_BITMAP_MODE_LM));
5215
5216 if (changed & (MSR_BITMAP_MODE_X2APIC | MSR_BITMAP_MODE_X2APIC_APICV))
5217 vmx_update_msr_bitmap_x2apic(msr_bitmap, mode);
5218
5219 vmx->msr_bitmap_mode = mode;
5220 }
5221
5182 5222 static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu)
5183 5223 {
5184 5224 return enable_apicv;
5468 5468 }
5469 5469
5470 5470 if (cpu_has_vmx_msr_bitmap())
5471 vmx_set_msr_bitmap(vcpu);
5471 vmx_update_msr_bitmap(vcpu);
5472 5472 }
5473 5473
5474 5474 static u32 vmx_exec_control(struct vcpu_vmx *vmx)
5655 5655 vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap));
5656 5656 }
5657 5657 if (cpu_has_vmx_msr_bitmap())
5658 vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy));
5658 vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap));
5659 5659
5660 5660 vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */
5661 5661
5733 5733 ++vmx->nmsrs;
5734 5734 }
5735 5735
5736 if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
5737 rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
5736 5738
5737 5739 vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
5738 5740
5763 5763 u64 cr0;
5764 5764
5765 5765 vmx->rmode.vm86_active = 0;
5766 vmx->spec_ctrl = 0;
5766 5767
5767 5768 vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
5768 5769 kvm_set_cr8(vcpu, 0);
6941 6941
6942 6942 static __init int hardware_setup(void)
6943 6943 {
6944 int r = -ENOMEM, i, msr;
6944 int r = -ENOMEM, i;
6945 6945
6946 6946 rdmsrl_safe(MSR_EFER, &host_efer);
6947 6947
6961 6961
6962 6962 memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE);
6963 6963
6964 memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE);
6965 memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE);
6966
6967 6964 if (setup_vmcs_config(&vmcs_config) < 0) {
6968 6965 r = -EIO;
6969 6966 goto out;
7029 7029 kvm_tsc_scaling_ratio_frac_bits = 48;
7030 7030 }
7031 7031
7032 vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
7033 vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
7034 vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
7035 vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false);
7036 vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
7037 vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
7038
7039 memcpy(vmx_msr_bitmap_legacy_x2apic_apicv,
7040 vmx_msr_bitmap_legacy, PAGE_SIZE);
7041 memcpy(vmx_msr_bitmap_longmode_x2apic_apicv,
7042 vmx_msr_bitmap_longmode, PAGE_SIZE);
7043 memcpy(vmx_msr_bitmap_legacy_x2apic,
7044 vmx_msr_bitmap_legacy, PAGE_SIZE);
7045 memcpy(vmx_msr_bitmap_longmode_x2apic,
7046 vmx_msr_bitmap_longmode, PAGE_SIZE);
7047
7048 7032 set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
7049 7033
7050 for (msr = 0x800; msr <= 0x8ff; msr++) {
7051 if (msr == 0x839 /* TMCCT */)
7052 continue;
7053 vmx_disable_intercept_msr_x2apic(msr, MSR_TYPE_R, true);
7054 }
7055
7056 /*
7057 * TPR reads and writes can be virtualized even if virtual interrupt
7058 * delivery is not in use.
7059 */
7060 vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_W, true);
7061 vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_R | MSR_TYPE_W, false);
7062
7063 /* EOI */
7064 vmx_disable_intercept_msr_x2apic(0x80b, MSR_TYPE_W, true);
7065 /* SELF-IPI */
7066 vmx_disable_intercept_msr_x2apic(0x83f, MSR_TYPE_W, true);
7067
7068 7034 if (enable_ept)
7069 7035 vmx_enable_tdp();
7070 7036 else
7134 7134 }
7135 7135
7136 7136 /*
7137 * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12.
7138 * We could reuse a single VMCS for all the L2 guests, but we also want the
7139 * option to allocate a separate vmcs02 for each separate loaded vmcs12 - this
7140 * allows keeping them loaded on the processor, and in the future will allow
7141 * optimizations where prepare_vmcs02 doesn't need to set all the fields on
7142 * every entry if they never change.
7143 * So we keep, in vmx->nested.vmcs02_pool, a cache of size VMCS02_POOL_SIZE
7144 * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first.
7145 *
7146 * The following functions allocate and free a vmcs02 in this pool.
7147 */
7148
7149 /* Get a VMCS from the pool to use as vmcs02 for the current vmcs12. */
7150 static struct loaded_vmcs *nested_get_current_vmcs02(struct vcpu_vmx *vmx)
7151 {
7152 struct vmcs02_list *item;
7153 list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
7154 if (item->vmptr == vmx->nested.current_vmptr) {
7155 list_move(&item->list, &vmx->nested.vmcs02_pool);
7156 return &item->vmcs02;
7157 }
7158
7159 if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) {
7160 /* Recycle the least recently used VMCS. */
7161 item = list_last_entry(&vmx->nested.vmcs02_pool,
7162 struct vmcs02_list, list);
7163 item->vmptr = vmx->nested.current_vmptr;
7164 list_move(&item->list, &vmx->nested.vmcs02_pool);
7165 return &item->vmcs02;
7166 }
7167
7168 /* Create a new VMCS */
7169 item = kzalloc(sizeof(struct vmcs02_list), GFP_KERNEL);
7170 if (!item)
7171 return NULL;
7172 item->vmcs02.vmcs = alloc_vmcs();
7173 item->vmcs02.shadow_vmcs = NULL;
7174 if (!item->vmcs02.vmcs) {
7175 kfree(item);
7176 return NULL;
7177 }
7178 loaded_vmcs_init(&item->vmcs02);
7179 item->vmptr = vmx->nested.current_vmptr;
7180 list_add(&(item->list), &(vmx->nested.vmcs02_pool));
7181 vmx->nested.vmcs02_num++;
7182 return &item->vmcs02;
7183 }
7184
7185 /* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */
7186 static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr)
7187 {
7188 struct vmcs02_list *item;
7189 list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
7190 if (item->vmptr == vmptr) {
7191 free_loaded_vmcs(&item->vmcs02);
7192 list_del(&item->list);
7193 kfree(item);
7194 vmx->nested.vmcs02_num--;
7195 return;
7196 }
7197 }
7198
7199 /*
7200 * Free all VMCSs saved for this vcpu, except the one pointed by
7201 * vmx->loaded_vmcs. We must be running L1, so vmx->loaded_vmcs
7202 * must be &vmx->vmcs01.
7203 */
7204 static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx)
7205 {
7206 struct vmcs02_list *item, *n;
7207
7208 WARN_ON(vmx->loaded_vmcs != &vmx->vmcs01);
7209 list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) {
7210 /*
7211 * Something will leak if the above WARN triggers. Better than
7212 * a use-after-free.
7213 */
7214 if (vmx->loaded_vmcs == &item->vmcs02)
7215 continue;
7216
7217 free_loaded_vmcs(&item->vmcs02);
7218 list_del(&item->list);
7219 kfree(item);
7220 vmx->nested.vmcs02_num--;
7221 }
7222 }
7223
7224 /*
7225 7137 * The following 3 functions, nested_vmx_succeed()/failValid()/failInvalid(),
7226 7138 * set the success or error code of an emulated VMX instruction, as specified
7227 7139 * by Vol 2B, VMX Instruction Reference, "Conventions".
7313 7313 {
7314 7314 struct vcpu_vmx *vmx = to_vmx(vcpu);
7315 7315 struct vmcs *shadow_vmcs;
7316 int r;
7316 7317
7317 if (cpu_has_vmx_msr_bitmap()) {
7318 vmx->nested.msr_bitmap =
7319 (unsigned long *)__get_free_page(GFP_KERNEL);
7320 if (!vmx->nested.msr_bitmap)
7321 goto out_msr_bitmap;
7322 }
7318 r = alloc_loaded_vmcs(&vmx->nested.vmcs02);
7319 if (r < 0)
7320 goto out_vmcs02;
7323 7321
7324 7322 vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL);
7325 7323 if (!vmx->nested.cached_vmcs12)
7334 7334 vmx->vmcs01.shadow_vmcs = shadow_vmcs;
7335 7335 }
7336 7336
7337 INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
7338 vmx->nested.vmcs02_num = 0;
7339
7340 7337 hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC,
7341 7338 HRTIMER_MODE_REL_PINNED);
7342 7339 vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
7345 7345 kfree(vmx->nested.cached_vmcs12);
7346 7346
7347 7347 out_cached_vmcs12:
7348 free_page((unsigned long)vmx->nested.msr_bitmap);
7348 free_loaded_vmcs(&vmx->nested.vmcs02);
7349 7349
7350 out_msr_bitmap:
7350 out_vmcs02:
7351 7351 return -ENOMEM;
7352 7352 }
7353 7353
7490 7490 free_vpid(vmx->nested.vpid02);
7491 7491 vmx->nested.posted_intr_nv = -1;
7492 7492 vmx->nested.current_vmptr = -1ull;
7493 if (vmx->nested.msr_bitmap) {
7494 free_page((unsigned long)vmx->nested.msr_bitmap);
7495 vmx->nested.msr_bitmap = NULL;
7496 }
7497 7493 if (enable_shadow_vmcs) {
7498 7494 vmx_disable_shadow_vmcs(vmx);
7499 7495 vmcs_clear(vmx->vmcs01.shadow_vmcs);
7497 7497 vmx->vmcs01.shadow_vmcs = NULL;
7498 7498 }
7499 7499 kfree(vmx->nested.cached_vmcs12);
7500 /* Unpin physical memory we referred to in current vmcs02 */
7500 /* Unpin physical memory we referred to in the vmcs02 */
7501 7501 if (vmx->nested.apic_access_page) {
7502 7502 kvm_release_page_dirty(vmx->nested.apic_access_page);
7503 7503 vmx->nested.apic_access_page = NULL;
7513 7513 vmx->nested.pi_desc = NULL;
7514 7514 }
7515 7515
7516 nested_free_all_saved_vmcss(vmx);
7516 free_loaded_vmcs(&vmx->nested.vmcs02);
7517 7517 }
7518 7518
7519 7519 /* Emulate the VMXOFF instruction */
7556 7556 vmptr + offsetof(struct vmcs12, launch_state),
7557 7557 &zero, sizeof(zero));
7558 7558
7559 nested_free_vmcs02(vmx, vmptr);
7560
7561 7559 nested_vmx_succeed(vcpu);
7562 7560 return kvm_skip_emulated_instruction(vcpu);
7563 7561 }
8467 8467
8468 8468 /*
8469 8469 * The host physical addresses of some pages of guest memory
8470 * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU
8471 * may write to these pages via their host physical address while
8472 * L2 is running, bypassing any address-translation-based dirty
8473 * tracking (e.g. EPT write protection).
8470 * are loaded into the vmcs02 (e.g. vmcs12's Virtual APIC
8471 * Page). The CPU may write to these pages via their host
8472 * physical address while L2 is running, bypassing any
8473 * address-translation-based dirty tracking (e.g. EPT write
8474 * protection).
8474 8475 *
8475 8476 * Mark them dirty on every exit from L2 to prevent them from
8476 8477 * getting out of sync with dirty tracking.
9005 9005 }
9006 9006 vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control);
9007 9007
9008 vmx_set_msr_bitmap(vcpu);
9008 vmx_update_msr_bitmap(vcpu);
9009 9009 }
9010 9010
9011 9011 static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
9191 9191 #endif
9192 9192 "pushf\n\t"
9193 9193 __ASM_SIZE(push) " $%c[cs]\n\t"
9194 "call *%[entry]\n\t"
9194 CALL_NOSPEC
9195 9195 :
9196 9196 #ifdef CONFIG_X86_64
9197 9197 [sp]"=&r"(tmp),
9198 9198 #endif
9199 9199 ASM_CALL_CONSTRAINT
9200 9200 :
9201 [entry]"r"(entry),
9201 THUNK_TARGET(entry),
9202 9202 [ss]"i"(__KERNEL_DS),
9203 9203 [cs]"i"(__KERNEL_CS)
9204 9204 );
9435 9435
9436 9436 vmx_arm_hv_timer(vcpu);
9437 9437
9438 /*
9439 * If this vCPU has touched SPEC_CTRL, restore the guest's value if
9440 * it's non-zero. Since vmentry is serialising on affected CPUs, there
9441 * is no need to worry about the conditional branch over the wrmsr
9442 * being speculatively taken.
9443 */
9444 if (vmx->spec_ctrl)
9445 wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
9446
9438 9447 vmx->__launched = vmx->loaded_vmcs->launched;
9439 9448 asm(
9440 9449 /* Store host registers */
9562 9562 #endif
9563 9563 );
9564 9564
9565 /*
9566 * We do not use IBRS in the kernel. If this vCPU has used the
9567 * SPEC_CTRL MSR it may have left it on; save the value and
9568 * turn it off. This is much more efficient than blindly adding
9569 * it to the atomic save/restore list. Especially as the former
9570 * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
9571 *
9572 * For non-nested case:
9573 * If the L01 MSR bitmap does not intercept the MSR, then we need to
9574 * save it.
9575 *
9576 * For nested case:
9577 * If the L02 MSR bitmap does not intercept the MSR, then we need to
9578 * save it.
9579 */
9580 if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
9581 rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
9582
9583 if (vmx->spec_ctrl)
9584 wrmsrl(MSR_IA32_SPEC_CTRL, 0);
9585
9565 9586 /* Eliminate branch target predictions from guest mode */
9566 9587 vmexit_fill_RSB();
9567 9588
9696 9696 {
9697 9697 int err;
9698 9698 struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
9699 unsigned long *msr_bitmap;
9699 9700 int cpu;
9700 9701
9701 9702 if (!vmx)
9729 9729 if (!vmx->guest_msrs)
9730 9730 goto free_pml;
9731 9731
9732 vmx->loaded_vmcs = &vmx->vmcs01;
9733 vmx->loaded_vmcs->vmcs = alloc_vmcs();
9734 vmx->loaded_vmcs->shadow_vmcs = NULL;
9735 if (!vmx->loaded_vmcs->vmcs)
9732 err = alloc_loaded_vmcs(&vmx->vmcs01);
9733 if (err < 0)
9736 9734 goto free_msrs;
9737 loaded_vmcs_init(vmx->loaded_vmcs);
9738 9735
9736 msr_bitmap = vmx->vmcs01.msr_bitmap;
9737 vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW);
9738 vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW);
9739 vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
9740 vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
9741 vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
9742 vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
9743 vmx->msr_bitmap_mode = 0;
9744
9745 vmx->loaded_vmcs = &vmx->vmcs01;
9739 9746 cpu = get_cpu();
9740 9747 vmx_vcpu_load(&vmx->vcpu, cpu);
9741 9748 vmx->vcpu.cpu = cpu;
10205 10205 int msr;
10206 10206 struct page *page;
10207 10207 unsigned long *msr_bitmap_l1;
10208 unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap;
10208 unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
10209 /*
10210 * pred_cmd & spec_ctrl are trying to verify two things:
10211 *
10212 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
10213 * ensures that we do not accidentally generate an L02 MSR bitmap
10214 * from the L12 MSR bitmap that is too permissive.
10215 * 2. That L1 or L2s have actually used the MSR. This avoids
10216 * unnecessarily merging of the bitmap if the MSR is unused. This
10217 * works properly because we only update the L01 MSR bitmap lazily.
10218 * So even if L0 should pass L1 these MSRs, the L01 bitmap is only
10219 * updated to reflect this when L1 (or its L2s) actually write to
10220 * the MSR.
10221 */
10222 bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
10223 bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
10209 10224
10210 /* This shortcut is ok because we support only x2APIC MSRs so far. */
10211 if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
10225 if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
10226 !pred_cmd && !spec_ctrl)
10212 10227 return false;
10213 10228
10214 10229 page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
10256 10256 MSR_TYPE_W);
10257 10257 }
10258 10258 }
10259
10260 if (spec_ctrl)
10261 nested_vmx_disable_intercept_for_msr(
10262 msr_bitmap_l1, msr_bitmap_l0,
10263 MSR_IA32_SPEC_CTRL,
10264 MSR_TYPE_R | MSR_TYPE_W);
10265
10266 if (pred_cmd)
10267 nested_vmx_disable_intercept_for_msr(
10268 msr_bitmap_l1, msr_bitmap_l0,
10269 MSR_IA32_PRED_CMD,
10270 MSR_TYPE_W);
10271
10259 10272 kunmap(page);
10260 10273 kvm_release_page_clean(page);
10261 10274
10810 10810 if (kvm_has_tsc_control)
10811 10811 decache_tsc_multiplier(vmx);
10812 10812
10813 if (cpu_has_vmx_msr_bitmap())
10814 vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap));
10815
10813 10816 if (enable_vpid) {
10814 10817 /*
10815 10818 * There is no direct mapping between vpid02 and vpid12, the
11034 11034 {
11035 11035 struct vcpu_vmx *vmx = to_vmx(vcpu);
11036 11036 struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
11037 struct loaded_vmcs *vmcs02;
11038 11037 u32 msr_entry_idx;
11039 11038 u32 exit_qual;
11040 11039
11041 vmcs02 = nested_get_current_vmcs02(vmx);
11042 if (!vmcs02)
11043 return -ENOMEM;
11044
11045 11040 enter_guest_mode(vcpu);
11046 11041
11047 11042 if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
11048 11043 vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
11049 11044
11050 vmx_switch_vmcs(vcpu, vmcs02);
11045 vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02);
11051 11046 vmx_segment_cache_clear(vmx);
11052 11047
11053 11048 if (prepare_vmcs02(vcpu, vmcs12, from_vmentry, &exit_qual)) {
11611 11611 vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
11612 11612
11613 11613 if (cpu_has_vmx_msr_bitmap())
11614 vmx_set_msr_bitmap(vcpu);
11614 vmx_update_msr_bitmap(vcpu);
11615 11615
11616 11616 if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr,
11617 11617 vmcs12->vm_exit_msr_load_count))
11659 11659 vm_entry_controls_reset_shadow(vmx);
11660 11660 vm_exit_controls_reset_shadow(vmx);
11661 11661 vmx_segment_cache_clear(vmx);
11662
11663 /* if no vmcs02 cache requested, remove the one we used */
11664 if (VMCS02_POOL_SIZE == 0)
11665 nested_free_vmcs02(vmx, vmx->nested.current_vmptr);
11666 11662
11667 11663 /* Update any VMCS fields that might have changed while L2 ran */
11668 11664 vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr);
1009 1009 #endif
1010 1010 MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
1011 1011 MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
1012 MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES
1012 1013 };
1013 1014
1014 1015 static unsigned num_msrs_to_save;
27 27 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
28 28 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
29 29 lib-$(CONFIG_RETPOLINE) += retpoline.o
30 OBJECT_FILES_NON_STANDARD_retpoline.o :=y
30 31
31 32 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
32 33
40 40 mov PER_CPU_VAR(current_task), %_ASM_DX
41 41 cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
42 42 jae bad_get_user
43 sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
44 and %_ASM_DX, %_ASM_AX
43 45 ASM_STAC
44 46 1: movzbl (%_ASM_AX),%edx
45 47 xor %eax,%eax
56 56 mov PER_CPU_VAR(current_task), %_ASM_DX
57 57 cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
58 58 jae bad_get_user
59 sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
60 and %_ASM_DX, %_ASM_AX
59 61 ASM_STAC
60 62 2: movzwl -1(%_ASM_AX),%edx
61 63 xor %eax,%eax
72 72 mov PER_CPU_VAR(current_task), %_ASM_DX
73 73 cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
74 74 jae bad_get_user
75 sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
76 and %_ASM_DX, %_ASM_AX
75 77 ASM_STAC
76 78 3: movl -3(%_ASM_AX),%edx
77 79 xor %eax,%eax
89 89 mov PER_CPU_VAR(current_task), %_ASM_DX
90 90 cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
91 91 jae bad_get_user
92 sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
93 and %_ASM_DX, %_ASM_AX
92 94 ASM_STAC
93 95 4: movq -7(%_ASM_AX),%rdx
94 96 xor %eax,%eax
102 102 mov PER_CPU_VAR(current_task), %_ASM_DX
103 103 cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
104 104 jae bad_get_user_8
105 sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */
106 and %_ASM_DX, %_ASM_AX
105 107 ASM_STAC
106 108 4: movl -7(%_ASM_AX),%edx
107 109 5: movl -3(%_ASM_AX),%ecx
7 7 #include <asm/alternative-asm.h>
8 8 #include <asm/export.h>
9 9 #include <asm/nospec-branch.h>
10 #include <asm/bitsperlong.h>
10 11
11 12 .macro THUNK reg
12 13 .section .text.__x86.indirect_thunk
47 47 GENERATE_THUNK(r14)
48 48 GENERATE_THUNK(r15)
49 49 #endif
50
51 /*
52 * Fill the CPU return stack buffer.
53 *
54 * Each entry in the RSB, if used for a speculative 'ret', contains an
55 * infinite 'pause; lfence; jmp' loop to capture speculative execution.
56 *
57 * This is required in various cases for retpoline and IBRS-based
58 * mitigations for the Spectre variant 2 vulnerability. Sometimes to
59 * eliminate potentially bogus entries from the RSB, and sometimes
60 * purely to ensure that it doesn't get empty, which on some CPUs would
61 * allow predictions from other (unwanted!) sources to be used.
62 *
63 * Google experimented with loop-unrolling and this turned out to be
64 * the optimal version - two calls, each with their own speculation
65 * trap should their return address end up getting used, in a loop.
66 */
67 .macro STUFF_RSB nr:req sp:req
68 mov $(\nr / 2), %_ASM_BX
69 .align 16
70 771:
71 call 772f
72 773: /* speculation trap */
73 pause
74 lfence
75 jmp 773b
76 .align 16
77 772:
78 call 774f
79 775: /* speculation trap */
80 pause
81 lfence
82 jmp 775b
83 .align 16
84 774:
85 dec %_ASM_BX
86 jnz 771b
87 add $((BITS_PER_LONG/8) * \nr), \sp
88 .endm
89
90 #define RSB_FILL_LOOPS 16 /* To avoid underflow */
91
92 ENTRY(__fill_rsb)
93 STUFF_RSB RSB_FILL_LOOPS, %_ASM_SP
94 ret
95 END(__fill_rsb)
96 EXPORT_SYMBOL_GPL(__fill_rsb)
97
98 #define RSB_CLEAR_LOOPS 32 /* To forcibly overwrite all entries */
99
100 ENTRY(__clear_rsb)
101 STUFF_RSB RSB_CLEAR_LOOPS, %_ASM_SP
102 ret
103 END(__clear_rsb)
104 EXPORT_SYMBOL_GPL(__clear_rsb)
331 331
332 332 unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
333 333 {
334 stac();
334 __uaccess_begin_nospec();
335 335 if (movsl_is_ok(to, from, n))
336 336 __copy_user(to, from, n);
337 337 else
338 338 n = __copy_user_intel(to, from, n);
339 clac();
339 __uaccess_end();
340 340 return n;
341 341 }
342 342 EXPORT_SYMBOL(__copy_user_ll);
344 344 unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
345 345 unsigned long n)
346 346 {
347 stac();
347 __uaccess_begin_nospec();
348 348 #ifdef CONFIG_X86_INTEL_USERCOPY
349 349 if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
350 350 n = __copy_user_intel_nocache(to, from, n);
353 353 #else
354 354 __copy_user(to, from, n);
355 355 #endif
356 clac();
356 __uaccess_end();
357 357 return n;
358 358 }
359 359 EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);
6 6 #include <linux/interrupt.h>
7 7 #include <linux/export.h>
8 8 #include <linux/cpu.h>
9 #include <linux/debugfs.h>
9 10
10 11 #include <asm/tlbflush.h>
11 12 #include <asm/mmu_context.h>
13 #include <asm/nospec-branch.h>
12 14 #include <asm/cache.h>
13 15 #include <asm/apic.h>
14 16 #include <asm/uv/uv.h>
15 #include <linux/debugfs.h>
16 17
17 18 /*
18 19 * TLB flushing, formerly SMP-only
248 248 } else {
249 249 u16 new_asid;
250 250 bool need_flush;
251 u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
251 252
253 /*
254 * Avoid user/user BTB poisoning by flushing the branch
255 * predictor when switching between processes. This stops
256 * one process from doing Spectre-v2 attacks on another.
257 *
258 * As an optimization, flush indirect branches only when
259 * switching into processes that disable dumping. This
260 * protects high value processes like gpg, without having
261 * too high performance overhead. IBPB is *expensive*!
262 *
263 * This will not flush branches when switching into kernel
264 * threads. It will also not flush if we switch to idle
265 * thread and back to the same process. It will flush if we
266 * switch to a different non-dumpable process.
267 */
268 if (tsk && tsk->mm &&
269 tsk->mm->context.ctx_id != last_ctx_id &&
270 get_dumpable(tsk->mm) != SUID_DUMP_USER)
271 indirect_branch_prediction_barrier();
272
252 273 if (IS_ENABLED(CONFIG_VMAP_STACK)) {
253 274 /*
254 275 * If our current stack is in vmalloc space and isn't
314 314 trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
315 315 }
316 316
317 /*
318 * Record last user mm's context id, so we can avoid
319 * flushing branch buffer with IBPB if we switch back
320 * to the same user.
321 */
322 if (next != &init_mm)
323 this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id);
324
317 325 this_cpu_write(cpu_tlbstate.loaded_mm, next);
318 326 this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
319 327 }
399 399 write_cr3(build_cr3(mm->pgd, 0));
400 400
401 401 /* Reinitialize tlbstate. */
402 this_cpu_write(cpu_tlbstate.last_ctx_id, mm->context.ctx_id);
402 403 this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
403 404 this_cpu_write(cpu_tlbstate.next_asid, 1);
404 405 this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
4162 4162 * TESTING: reset DISP_LIST flag, because: 1)
4163 4163 * this rq this request has passed through
4164 4164 * bfq_prepare_request, 2) then it will have
4165 * bfq_finish_request invoked on it, and 3) in
4166 * bfq_finish_request we use this flag to check
4167 * that bfq_finish_request is not invoked on
4165 * bfq_finish_requeue_request invoked on it, and 3) in
4166 * bfq_finish_requeue_request we use this flag to check
4167 * that bfq_finish_requeue_request is not invoked on
4168 4168 * requests for which bfq_prepare_request has
4169 4169 * been invoked.
4170 4170 */
4173 4173 }
4174 4174
4175 4175 /*
4176 * We exploit the bfq_finish_request hook to decrement
4177 * rq_in_driver, but bfq_finish_request will not be
4176 * We exploit the bfq_finish_requeue_request hook to decrement
4177 * rq_in_driver, but bfq_finish_requeue_request will not be
4178 4178 * invoked on this request. So, to avoid unbalance,
4179 4179 * just start this request, without incrementing
4180 4180 * rq_in_driver. As a negative consequence,
4183 4183 * bfq_schedule_dispatch to be invoked uselessly.
4184 4184 *
4185 4185 * As for implementing an exact solution, the
4186 * bfq_finish_request hook, if defined, is probably
4186 * bfq_finish_requeue_request hook, if defined, is probably
4187 4187 * invoked also on this request. So, by exploiting
4188 4188 * this hook, we could 1) increment rq_in_driver here,
4189 * and 2) decrement it in bfq_finish_request. Such a
4189 * and 2) decrement it in bfq_finish_requeue_request. Such a
4190 4190 * solution would let the value of the counter be
4191 4191 * always accurate, but it would entail using an extra
4192 4192 * interface function. This cost seems higher than the
4878 4878 return idle_timer_disabled;
4879 4879 }
4880 4880
4881 static void bfq_prepare_request(struct request *rq, struct bio *bio);
4882
4881 4883 static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
4882 4884 bool at_head)
4883 4885 {
4921 4921 BUG_ON(!(rq->rq_flags & RQF_GOT));
4922 4922 rq->rq_flags &= ~RQF_GOT;
4923 4923
4924 if (!bfqq) {
4925 /*
4926 * This should never happen. Most likely rq is
4927 * a requeued regular request, being
4928 * re-inserted without being first
4929 * re-prepared. Do a prepare, to avoid
4930 * failure.
4931 */
4932 pr_warn("Regular request associated with no queue");
4933 WARN_ON(1);
4934 bfq_prepare_request(rq, rq->bio);
4935 bfqq = RQ_BFQQ(rq);
4936 }
4937
4924 4938 #if defined(BFQ_GROUP_IOSCHED_ENABLED) && defined(CONFIG_DEBUG_BLK_CGROUP)
4925 4939 idle_timer_disabled = __bfq_insert_request(bfqd, rq);
4926 4940 /*
5126 5126 }
5127 5127 }
5128 5128
5129 static void bfq_finish_request_body(struct bfq_queue *bfqq)
5129 static void bfq_finish_requeue_request_body(struct bfq_queue *bfqq)
5130 5130 {
5131 5131 bfq_log_bfqq(bfqq->bfqd, bfqq,
5132 5132 "put_request_body: allocated %d", bfqq->allocated);
5136 5136 bfq_put_queue(bfqq);
5137 5137 }
5138 5138
5139 static void bfq_finish_request(struct request *rq)
5139 /*
5140 * Handle either a requeue or a finish for rq. The things to do are
5141 * the same in both cases: all references to rq are to be dropped. In
5142 * particular, rq is considered completed from the point of view of
5143 * the scheduler.
5144 */
5145 static void bfq_finish_requeue_request(struct request *rq)
5140 5146 {
5141 5147 struct bfq_queue *bfqq;
5142 5148 struct bfq_data *bfqd;
5150 5150
5151 5151 BUG_ON(!rq);
5152 5152
5153 if (!rq->elv.icq)
5153 bfqq = RQ_BFQQ(rq);
5154
5155 /*
5156 * Requeue and finish hooks are invoked in blk-mq without
5157 * checking whether the involved request is actually still
5158 * referenced in the scheduler. To handle this fact, the
5159 * following two checks make this function exit in case of
5160 * spurious invocations, for which there is nothing to do.
5161 *
5162 * First, check whether rq has nothing to do with an elevator.
5163 */
5164 if (unlikely(!(rq->rq_flags & RQF_ELVPRIV)))
5154 5165 return;
5155 5166
5156 bfqq = RQ_BFQQ(rq);
5157 BUG_ON(!bfqq);
5167 /*
5168 * rq either is not associated with any icq, or is an already
5169 * requeued request that has not (yet) been re-inserted into
5170 * a bfq_queue.
5171 */
5172 if (!rq->elv.icq || !bfqq)
5173 return;
5158 5174
5159 5175 bic = RQ_BIC(rq);
5160 5176 BUG_ON(!bic);
5183 5183 BUG();
5184 5184 }
5185 5185 BUG_ON(rq->rq_flags & RQF_QUEUED);
5186 BUG_ON(!(rq->rq_flags & RQF_ELVPRIV));
5187 5186
5188 5187 bfq_log_bfqq(bfqd, bfqq,
5189 5188 "putting rq %p with %u sects left, STARTED %d",
5203 5203 spin_lock_irqsave(&bfqd->lock, flags);
5204 5204
5205 5205 bfq_completed_request(bfqq, bfqd);
5206 bfq_finish_request_body(bfqq);
5206 bfq_finish_requeue_request_body(bfqq);
5207 5207
5208 5208 spin_unlock_irqrestore(&bfqd->lock, flags);
5209 5209 } else {
5210 5210 /*
5211 5211 * Request rq may be still/already in the scheduler,
5212 * in which case we need to remove it. And we cannot
5212 * in which case we need to remove it (this should
5213 * never happen in case of requeue). And we cannot
5213 5214 * defer such a check and removal, to avoid
5214 5215 * inconsistencies in the time interval from the end
5215 5216 * of this function to the start of the deferred work.
5227 5227 bfqg_stats_update_io_remove(bfqq_group(bfqq),
5228 5228 rq->cmd_flags);
5229 5229 }
5230 bfq_finish_request_body(bfqq);
5230 bfq_finish_requeue_request_body(bfqq);
5231 5231 }
5232 5232
5233 /*
5234 * Reset private fields. In case of a requeue, this allows
5235 * this function to correctly do nothing if it is spuriously
5236 * invoked again on this same request (see the check at the
5237 * beginning of the function). Probably, a better general
5238 * design would be to prevent blk-mq from invoking the requeue
5239 * or finish hooks of an elevator, for a request that is not
5240 * referred by that elevator.
5241 *
5242 * Resetting the following fields would break the
5243 * request-insertion logic if rq is re-inserted into a bfq
5244 * internal queue, without a re-preparation. Here we assume
5245 * that re-insertions of requeued requests, without
5246 * re-preparation, can happen only for pass_through or at_head
5247 * requests (which are not re-inserted into bfq internal
5248 * queues).
5249 */
5233 5250 rq->elv.priv[0] = NULL;
5234 5251 rq->elv.priv[1] = NULL;
5235 5252 }
6015 6015 .ops.mq = {
6016 6016 .limit_depth = bfq_limit_depth,
6017 6017 .prepare_request = bfq_prepare_request,
6018 .finish_request = bfq_finish_request,
6018 .requeue_request = bfq_finish_requeue_request,
6019 .finish_request = bfq_finish_requeue_request,
6019 6020 .exit_icq = bfq_exit_icq,
6020 6021 .insert_requests = bfq_insert_requests,
6021 6022 .dispatch_request = bfq_dispatch_request,
441 441 .remove = img_ascii_lcd_remove,
442 442 };
443 443 module_platform_driver(img_ascii_lcd_driver);
444
445 MODULE_DESCRIPTION("Imagination Technologies ASCII LCD Display");
446 MODULE_AUTHOR("Paul Burton <paul.burton@mips.com>");
447 MODULE_LICENSE("GPL");
147 147 mgr_node = of_parse_phandle(np, "fpga-mgr", 0);
148 148 if (mgr_node) {
149 149 mgr = of_fpga_mgr_get(mgr_node);
150 of_node_put(mgr_node);
150 151 of_node_put(np);
151 152 return mgr;
152 153 }
193 193 parent_br = region_np->parent;
194 194
195 195 /* If overlay has a list of bridges, use it. */
196 if (of_parse_phandle(overlay, "fpga-bridges", 0))
196 br = of_parse_phandle(overlay, "fpga-bridges", 0);
197 if (br) {
198 of_node_put(br);
197 199 np = overlay;
198 else
200 } else {
199 201 np = region_np;
202 }
200 203
201 204 for (i = 0; ; i++) {
202 205 br = of_parse_phandle(np, "fpga-bridges", i);
207 207 break;
208 208
209 209 /* If parent bridge is in list, skip it. */
210 if (br == parent_br)
210 if (br == parent_br) {
211 of_node_put(br);
211 212 continue;
213 }
212 214
213 215 /* If node is a bridge, get it and add to list */
214 216 ret = fpga_bridge_get_to_list(br, region->info,
215 217 &region->bridge_list);
218 of_node_put(br);
216 219
217 220 /* If any of the bridges are in use, give up */
218 221 if (ret == -EBUSY) {
337 337 if (size < LOG_MAX_LINE_SIZE - 1) {
338 338 append_entry(entry, buffer, size);
339 339 } else {
340 append_entry(entry, "LOG_ERROR, line too long\n", 27);
340 append_entry(entry, "LOG_ERROR, line too long\n", 25);
341 341 }
342 342
343 343 va_end(args);
63 63 .id_table = kxsd9_i2c_id,
64 64 };
65 65 module_i2c_driver(kxsd9_i2c_driver);
66
67 MODULE_LICENSE("GPL v2");
68 MODULE_DESCRIPTION("KXSD9 accelerometer I2C interface");
5 5 #include <linux/math64.h>
6 6 #include <linux/log2.h>
7 7 #include <linux/err.h>
8 #include <linux/module.h>
8 9
9 10 #include "qcom-vadc-common.h"
10 11
230 230 return __ffs64(value / VADC_DECIMATION_MIN);
231 231 }
232 232 EXPORT_SYMBOL(qcom_vadc_decimation_from_dt);
233
234 MODULE_LICENSE("GPL v2");
235 MODULE_DESCRIPTION("Qualcomm ADC common functionality");
773 773 struct sockaddr _sockaddr;
774 774 struct sockaddr_in _sockaddr_in;
775 775 struct sockaddr_in6 _sockaddr_in6;
776 struct sockaddr_ib _sockaddr_ib;
776 777 } sgid_addr, dgid_addr;
777 778
778 779
436 436 return 0;
437 437 }
438 438 EXPORT_SYMBOL_GPL(pxa2xx_pinctrl_exit);
439
440 MODULE_AUTHOR("Robert Jarzmik <robert.jarzmik@free.fr>");
441 MODULE_DESCRIPTION("Marvell PXA2xx pinctrl driver");
442 MODULE_LICENSE("GPL v2");
974 974 }
975 975 } else {
976 976 retval = uart_startup(tty, state, 1);
977 if (retval == 0)
978 tty_port_set_initialized(port, true);
977 979 if (retval > 0)
978 980 retval = 0;
979 981 }
10 10 #include <linux/compiler.h>
11 11 #include <linux/spinlock.h>
12 12 #include <linux/rcupdate.h>
13 #include <linux/nospec.h>
13 14 #include <linux/types.h>
14 15 #include <linux/init.h>
15 16 #include <linux/fs.h>
83 83 {
84 84 struct fdtable *fdt = rcu_dereference_raw(files->fdt);
85 85
86 if (fd < fdt->max_fds)
86 if (fd < fdt->max_fds) {
87 fd = array_index_nospec(fd, fdt->max_fds);
87 88 return rcu_dereference_raw(fdt->fd[fd]);
89 }
88 90 return NULL;
89 91 }
90 92
5 5 #include <linux/compiler.h>
6 6 #include <linux/types.h>
7 7
8 /* Built-in __init functions needn't be compiled with retpoline */
9 #if defined(RETPOLINE) && !defined(MODULE)
10 #define __noretpoline __attribute__((indirect_branch("keep")))
11 #else
12 #define __noretpoline
13 #endif
14
8 15 /* These macros are used to mark some functions or
9 16 * initialized data (doesn't apply to uninitialized data)
10 17 * as `initialization' functions. The kernel can take this
47 47
48 48 /* These are for everybody (although not all archs will actually
49 49 discard it in modules) */
50 #define __init __section(.init.text) __cold __latent_entropy
50 #define __init __section(.init.text) __cold __latent_entropy __noretpoline
51 51 #define __initdata __section(.init.data)
52 52 #define __initconst __section(.init.rodata)
53 53 #define __exitdata __section(.exit.data)
801 801 static inline void module_bug_cleanup(struct module *mod) {}
802 802 #endif /* CONFIG_GENERIC_BUG */
803 803
804 #ifdef RETPOLINE
805 extern bool retpoline_module_ok(bool has_retpoline);
806 #else
807 static inline bool retpoline_module_ok(bool has_retpoline)
808 {
809 return true;
810 }
811 #endif
812
804 813 #ifdef CONFIG_MODULE_SIG
805 814 static inline bool module_sig_ok(struct module *module)
806 815 {
1 // SPDX-License-Identifier: GPL-2.0
2 // Copyright(c) 2018 Linus Torvalds. All rights reserved.
3 // Copyright(c) 2018 Alexei Starovoitov. All rights reserved.
4 // Copyright(c) 2018 Intel Corporation. All rights reserved.
5
6 #ifndef _LINUX_NOSPEC_H
7 #define _LINUX_NOSPEC_H
8
9 /**
10 * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise
11 * @index: array element index
12 * @size: number of elements in array
13 *
14 * When @index is out of bounds (@index >= @size), the sign bit will be
15 * set. Extend the sign bit to all bits and invert, giving a result of
16 * zero for an out of bounds index, or ~0 if within bounds [0, @size).
17 */
18 #ifndef array_index_mask_nospec
19 static inline unsigned long array_index_mask_nospec(unsigned long index,
20 unsigned long size)
21 {
22 /*
23 * Warn developers about inappropriate array_index_nospec() usage.
24 *
25 * Even if the CPU speculates past the WARN_ONCE branch, the
26 * sign bit of @index is taken into account when generating the
27 * mask.
28 *
29 * This warning is compiled out when the compiler can infer that
30 * @index and @size are less than LONG_MAX.
31 */
32 if (WARN_ONCE(index > LONG_MAX || size > LONG_MAX,
33 "array_index_nospec() limited to range of [0, LONG_MAX]\n"))
34 return 0;
35
36 /*
37 * Always calculate and emit the mask even if the compiler
38 * thinks the mask is not needed. The compiler does not take
39 * into account the value of @index under speculation.
40 */
41 OPTIMIZER_HIDE_VAR(index);
42 return ~(long)(index | (size - 1UL - index)) >> (BITS_PER_LONG - 1);
43 }
44 #endif
45
46 /*
47 * array_index_nospec - sanitize an array index after a bounds check
48 *
49 * For a code sequence like:
50 *
51 * if (index < size) {
52 * index = array_index_nospec(index, size);
53 * val = array[index];
54 * }
55 *
56 * ...if the CPU speculates past the bounds check then
57 * array_index_nospec() will clamp the index within the range of [0,
58 * size).
59 */
60 #define array_index_nospec(index, size) \
61 ({ \
62 typeof(index) _i = (index); \
63 typeof(size) _s = (size); \
64 unsigned long _mask = array_index_mask_nospec(_i, _s); \
65 \
66 BUILD_BUG_ON(sizeof(_i) > sizeof(long)); \
67 BUILD_BUG_ON(sizeof(_s) > sizeof(long)); \
68 \
69 _i &= _mask; \
70 _i; \
71 })
72 #endif /* _LINUX_NOSPEC_H */
2863 2863 }
2864 2864 #endif /* CONFIG_LIVEPATCH */
2865 2865
2866 static void check_modinfo_retpoline(struct module *mod, struct load_info *info)
2867 {
2868 if (retpoline_module_ok(get_modinfo(info, "retpoline")))
2869 return;
2870
2871 pr_warn("%s: loading module not compiled with retpoline compiler.\n",
2872 mod->name);
2873 }
2874
2866 2875 /* Sets info->hdr and info->len. */
2867 2876 static int copy_module_from_user(const void __user *umod, unsigned long len,
2868 2877 struct load_info *info)
3037 3037 mod->name);
3038 3038 add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK);
3039 3039 }
3040
3041 check_modinfo_retpoline(mod, info);
3040 3042
3041 3043 if (get_modinfo(info, "staging")) {
3042 3044 add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK);
16 16 #include <linux/nl80211.h>
17 17 #include <linux/rtnetlink.h>
18 18 #include <linux/netlink.h>
19 #include <linux/nospec.h>
19 20 #include <linux/etherdevice.h>
20 21 #include <net/net_namespace.h>
21 22 #include <net/genetlink.h>
2057 2057 static int parse_txq_params(struct nlattr *tb[],
2058 2058 struct ieee80211_txq_params *txq_params)
2059 2059 {
2060 u8 ac;
2061
2060 2062 if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
2061 2063 !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
2062 2064 !tb[NL80211_TXQ_ATTR_AIFS])
2063 2065 return -EINVAL;
2064 2066
2065 txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
2067 ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
2066 2068 txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
2067 2069 txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
2068 2070 txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
2069 2071 txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
2070 2072
2071 if (txq_params->ac >= NL80211_NUM_ACS)
2073 if (ac >= NL80211_NUM_ACS)
2072 2074 return -EINVAL;
2073
2075 txq_params->ac = array_index_nospec(ac, NL80211_NUM_ACS);
2074 2076 return 0;
2075 2077 }
2076 2078
2165 2165 buf_printf(b, "\nMODULE_INFO(intree, \"Y\");\n");
2166 2166 }
2167 2167
2168 /* Cannot check for assembler */
2169 static void add_retpoline(struct buffer *b)
2170 {
2171 buf_printf(b, "\n#ifdef RETPOLINE\n");
2172 buf_printf(b, "MODULE_INFO(retpoline, \"Y\");\n");
2173 buf_printf(b, "#endif\n");
2174 }
2175
2168 2176 static void add_staging_flag(struct buffer *b, const char *name)
2169 2177 {
2170 2178 static const char *staging_dir = "drivers/staging";
2514 2514 err |= check_modname_len(mod);
2515 2515 add_header(&buf, mod);
2516 2516 add_intree_flag(&buf, !external_module);
2517 add_retpoline(&buf);
2517 2518 add_staging_flag(&buf, mod->name);
2518 2519 err |= add_versions(&buf, mod);
2519 2520 add_depends(&buf, mod, modules);
70 70 };
71 71
72 72 module_spi_driver(pcm512x_spi_driver);
73
74 MODULE_DESCRIPTION("ASoC PCM512x codec driver - SPI");
75 MODULE_AUTHOR("Mark Brown <broonie@kernel.org>");
76 MODULE_LICENSE("GPL v2");
543 543 dest_off = insn->offset + insn->len + insn->immediate;
544 544 insn->call_dest = find_symbol_by_offset(insn->sec,
545 545 dest_off);
546 /*
547 * FIXME: Thanks to retpolines, it's now considered
548 * normal for a function to call within itself. So
549 * disable this warning for now.
550 */
551 #if 0
552 if (!insn->call_dest) {
553 WARN_FUNC("can't find call dest symbol at offset 0x%lx",
554 insn->sec, insn->offset, dest_off);
546
547 if (!insn->call_dest && !insn->ignore) {
548 WARN_FUNC("unsupported intra-function call",
549 insn->sec, insn->offset);
550 WARN("If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.");
555 551 return -1;
556 552 }
557 #endif
553
558 554 } else if (rela->sym->type == STT_SECTION) {
559 555 insn->call_dest = find_symbol_by_offset(rela->sym->sec,
560 556 rela->addend+4);
594 594 struct instruction *orig_insn,
595 595 struct instruction **new_insn)
596 596 {
597 struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump;
597 struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump = NULL;
598 598 unsigned long dest_off;
599 599
600 600 last_orig_insn = NULL;
610 610 last_orig_insn = insn;
611 611 }
612 612
613 if (!next_insn_same_sec(file, last_orig_insn)) {
614 WARN("%s: don't know how to handle alternatives at end of section",
615 special_alt->orig_sec->name);
616 return -1;
617 }
613 if (next_insn_same_sec(file, last_orig_insn)) {
614 fake_jump = malloc(sizeof(*fake_jump));
615 if (!fake_jump) {
616 WARN("malloc failed");
617 return -1;
618 }
619 memset(fake_jump, 0, sizeof(*fake_jump));
620 INIT_LIST_HEAD(&fake_jump->alts);
621 clear_insn_state(&fake_jump->state);
618 622
619 fake_jump = malloc(sizeof(*fake_jump));
620 if (!fake_jump) {
621 WARN("malloc failed");
622 return -1;
623 fake_jump->sec = special_alt->new_sec;
624 fake_jump->offset = -1;
625 fake_jump->type = INSN_JUMP_UNCONDITIONAL;
626 fake_jump->jump_dest = list_next_entry(last_orig_insn, list);
627 fake_jump->ignore = true;
623 628 }
624 memset(fake_jump, 0, sizeof(*fake_jump));
625 INIT_LIST_HEAD(&fake_jump->alts);
626 clear_insn_state(&fake_jump->state);
627 629
628 fake_jump->sec = special_alt->new_sec;
629 fake_jump->offset = -1;
630 fake_jump->type = INSN_JUMP_UNCONDITIONAL;
631 fake_jump->jump_dest = list_next_entry(last_orig_insn, list);
632 fake_jump->ignore = true;
633
634 630 if (!special_alt->new_len) {
631 if (!fake_jump) {
632 WARN("%s: empty alternative at end of section",
633 special_alt->orig_sec->name);
634 return -1;
635 }
636
635 637 *new_insn = fake_jump;
636 638 return 0;
637 639 }
646 646
647 647 last_new_insn = insn;
648 648
649 insn->ignore = orig_insn->ignore_alts;
650
649 651 if (insn->type != INSN_JUMP_CONDITIONAL &&
650 652 insn->type != INSN_JUMP_UNCONDITIONAL)
651 653 continue;
656 656 continue;
657 657
658 658 dest_off = insn->offset + insn->len + insn->immediate;
659 if (dest_off == special_alt->new_off + special_alt->new_len)
659 if (dest_off == special_alt->new_off + special_alt->new_len) {
660 if (!fake_jump) {
661 WARN("%s: alternative jump to end of section",
662 special_alt->orig_sec->name);
663 return -1;
664 }
660 665 insn->jump_dest = fake_jump;
666 }
661 667
662 668 if (!insn->jump_dest) {
663 669 WARN_FUNC("can't find alternative jump destination",
678 678 return -1;
679 679 }
680 680
681 list_add(&fake_jump->list, &last_new_insn->list);
681 if (fake_jump)
682 list_add(&fake_jump->list, &last_new_insn->list);
682 683
683 684 return 0;
684 685 }
736 736 goto out;
737 737 }
738 738
739 /* Ignore retpoline alternatives. */
740 if (orig_insn->ignore_alts)
741 continue;
742
743 739 new_insn = NULL;
744 740 if (!special_alt->group || special_alt->new_len) {
745 741 new_insn = find_insn(file, special_alt->new_sec,
1092 1092 if (ret)
1093 1093 return ret;
1094 1094
1095 ret = add_call_destinations(file);
1095 ret = add_special_section_alts(file);
1096 1096 if (ret)
1097 1097 return ret;
1098 1098
1099 ret = add_special_section_alts(file);
1099 ret = add_call_destinations(file);
1100 1100 if (ret)
1101 1101 return ret;
1102 1102
1723 1723
1724 1724 insn->visited = true;
1725 1725
1726 list_for_each_entry(alt, &insn->alts, list) {
1727 ret = validate_branch(file, alt->insn, state);
1728 if (ret)
1729 return 1;
1726 if (!insn->ignore_alts) {
1727 list_for_each_entry(alt, &insn->alts, list) {
1728 ret = validate_branch(file, alt->insn, state);
1729 if (ret)
1730 return 1;
1731 }
1730 1732 }
1731 1733
1732 1734 switch (insn->type) {
98 98 struct orc_entry *orc;
99 99 struct rela *rela;
100 100
101 if (!insn_sec->sym) {
102 WARN("missing symbol for section %s", insn_sec->name);
103 return -1;
104 }
105
101 106 /* populate ORC data */
102 107 orc = (struct orc_entry *)u_sec->data->d_buf + idx;
103 108 memcpy(orc, o, sizeof(*orc));