From: "Alexey Shvetsov" <alexxy@gentoo.org>
To: gentoo-commits@lists.gentoo.org
Subject: [gentoo-commits] proj/x11:opencl commit in: sys-devel/llvm/files/, sys-devel/llvm/
Date: Tue, 5 Mar 2013 05:38:17 +0000 (UTC) [thread overview]
Message-ID: <1362461694.9597d3a8e121c0e0961de5642b1f550700dc8496.alexxy@gentoo> (raw)
commit: 9597d3a8e121c0e0961de5642b1f550700dc8496
Author: Alexey Shvetsov <alexxy <AT> gentoo <DOT> org>
AuthorDate: Tue Mar 5 05:34:54 2013 +0000
Commit: Alexey Shvetsov <alexxy <AT> gentoo <DOT> org>
CommitDate: Tue Mar 5 05:34:54 2013 +0000
URL: http://git.overlays.gentoo.org/gitweb/?p=proj/x11.git;a=commit;h=9597d3a8
Update llvm R600 patch
Package-Manager: portage-2.2.0_alpha166
RepoMan-Options: --force
---
...-Add-R600-backend.patch => R600-Mesa-9.1.patch} | 7809 +++++++++++++-------
sys-devel/llvm/llvm-3.2.ebuild | 57 +-
sys-devel/llvm/metadata.xml | 1 -
3 files changed, 5020 insertions(+), 2847 deletions(-)
diff --git a/sys-devel/llvm/files/0001-Add-R600-backend.patch b/sys-devel/llvm/files/R600-Mesa-9.1.patch
similarity index 81%
rename from sys-devel/llvm/files/0001-Add-R600-backend.patch
rename to sys-devel/llvm/files/R600-Mesa-9.1.patch
index 4ebe499..9b9e1f5 100644
--- a/sys-devel/llvm/files/0001-Add-R600-backend.patch
+++ b/sys-devel/llvm/files/R600-Mesa-9.1.patch
@@ -1,517 +1,46 @@
-From 07d146158af424e4c0aa85a3de49516d97affbb9 Mon Sep 17 00:00:00 2001
-From: Tom Stellard <thomas.stellard@amd.com>
-Date: Tue, 11 Dec 2012 21:25:42 +0000
-Subject: [PATCH] Add R600 backend
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-A new backend supporting AMD GPUs: Radeon HD2XXX - HD7XXX
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169915 91177308-0d34-0410-b5e6-96231b3b80d8
-
-Conflicts:
- lib/Target/LLVMBuild.txt
-
-[CMake] Fixup R600.
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169962 91177308-0d34-0410-b5e6-96231b3b80d8
-
-Avoid setIsInsideBundle in Target/R600.
-
-This function is going to be removed.
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170064 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: remove nonsense setPrefLoopAlignment
-
-The Align parameter is a power of two, so 16 results in 64K
-alignment. Additional to that even 16 byte alignment doesn't
-make any sense, so just remove it.
-
-Patch by: Christian König
-
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170341 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: BB operand support for SI
-
-Patch by: Christian König
-
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170342 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: enable S_*N2_* instructions
-
-They seem to work fine.
-
-Patch by: Christian König
-
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170343 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: New control flow for SI v2
-
-This patch replaces the control flow handling with a new
-pass which structurize the graph before transforming it to
-machine instruction. This has a couple of different advantages
-and currently fixes 20 piglit tests without a single regression.
-
-It is now a general purpose transformation that could be not
-only be used for SI/R6xx, but also for other hardware
-implementations that use a form of structurized control flow.
-
-v2: further cleanup, fixes and documentation
-
-Patch by: Christian König
-
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170591 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: control flow optimization
-
-Branch if we have enough instructions so that it makes sense.
-Also remove branches if they don't make sense.
-
-Patch by: Christian König
-
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170592 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Remove unecessary VREG alignment.
-
-Unlike SGPRs VGPRs doesn't need to be aligned.
-
-Patch by: Christian König
-
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170593 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Add entry in CODE_OWNERS.TXT
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170594 91177308-0d34-0410-b5e6-96231b3b80d8
-
-Conflicts:
- CODE_OWNERS.TXT
-
-Target/R600: Update MIB according to r170588.
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170620 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Expand vec4 INT <-> FP conversions
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170901 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Add SHADOWCUBE to TEX_SHADOW pattern
-
-Patch by: Vadim Girlin
-
-Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170921 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Fix MAX_UINT definition
-
-Patch by: Vadim Girlin
-
-Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170922 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Coding style - remove empty spaces from the beginning of functions
-
-No functionality change.
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170923 91177308-0d34-0410-b5e6-96231b3b80d8
----
- CODE_OWNERS.TXT | 14 +
- include/llvm/Intrinsics.td | 1 +
- include/llvm/IntrinsicsR600.td | 36 +
- lib/Target/LLVMBuild.txt | 2 +-
- lib/Target/R600/AMDGPU.h | 49 +
- lib/Target/R600/AMDGPU.td | 40 +
- lib/Target/R600/AMDGPUAsmPrinter.cpp | 138 +
- lib/Target/R600/AMDGPUAsmPrinter.h | 44 +
- lib/Target/R600/AMDGPUCodeEmitter.h | 49 +
- lib/Target/R600/AMDGPUConvertToISA.cpp | 62 +
- lib/Target/R600/AMDGPUISelLowering.cpp | 417 +++
- lib/Target/R600/AMDGPUISelLowering.h | 144 +
- lib/Target/R600/AMDGPUInstrInfo.cpp | 257 ++
- lib/Target/R600/AMDGPUInstrInfo.h | 149 +
- lib/Target/R600/AMDGPUInstrInfo.td | 74 +
- lib/Target/R600/AMDGPUInstructions.td | 190 ++
- lib/Target/R600/AMDGPUIntrinsics.td | 62 +
- lib/Target/R600/AMDGPUMCInstLower.cpp | 83 +
- lib/Target/R600/AMDGPUMCInstLower.h | 34 +
- lib/Target/R600/AMDGPURegisterInfo.cpp | 51 +
- lib/Target/R600/AMDGPURegisterInfo.h | 63 +
- lib/Target/R600/AMDGPURegisterInfo.td | 22 +
- lib/Target/R600/AMDGPUStructurizeCFG.cpp | 714 +++++
- lib/Target/R600/AMDGPUSubtarget.cpp | 87 +
- lib/Target/R600/AMDGPUSubtarget.h | 65 +
- lib/Target/R600/AMDGPUTargetMachine.cpp | 142 +
- lib/Target/R600/AMDGPUTargetMachine.h | 70 +
- lib/Target/R600/AMDIL.h | 106 +
- lib/Target/R600/AMDIL7XXDevice.cpp | 115 +
- lib/Target/R600/AMDIL7XXDevice.h | 72 +
- lib/Target/R600/AMDILBase.td | 85 +
- lib/Target/R600/AMDILCFGStructurizer.cpp | 3049 ++++++++++++++++++++
- lib/Target/R600/AMDILDevice.cpp | 124 +
- lib/Target/R600/AMDILDevice.h | 117 +
- lib/Target/R600/AMDILDeviceInfo.cpp | 94 +
- lib/Target/R600/AMDILDeviceInfo.h | 88 +
- lib/Target/R600/AMDILDevices.h | 19 +
- lib/Target/R600/AMDILEvergreenDevice.cpp | 169 ++
- lib/Target/R600/AMDILEvergreenDevice.h | 93 +
- lib/Target/R600/AMDILFrameLowering.cpp | 47 +
- lib/Target/R600/AMDILFrameLowering.h | 40 +
- lib/Target/R600/AMDILISelDAGToDAG.cpp | 485 ++++
- lib/Target/R600/AMDILISelLowering.cpp | 651 +++++
- lib/Target/R600/AMDILInstrInfo.td | 208 ++
- lib/Target/R600/AMDILIntrinsicInfo.cpp | 79 +
- lib/Target/R600/AMDILIntrinsicInfo.h | 49 +
- lib/Target/R600/AMDILIntrinsics.td | 242 ++
- lib/Target/R600/AMDILNIDevice.cpp | 65 +
- lib/Target/R600/AMDILNIDevice.h | 57 +
- lib/Target/R600/AMDILPeepholeOptimizer.cpp | 1215 ++++++++
- lib/Target/R600/AMDILRegisterInfo.td | 107 +
- lib/Target/R600/AMDILSIDevice.cpp | 45 +
- lib/Target/R600/AMDILSIDevice.h | 39 +
- lib/Target/R600/CMakeLists.txt | 55 +
- lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp | 132 +
- lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h | 52 +
- lib/Target/R600/InstPrinter/CMakeLists.txt | 7 +
- lib/Target/R600/InstPrinter/LLVMBuild.txt | 24 +
- lib/Target/R600/InstPrinter/Makefile | 15 +
- lib/Target/R600/LLVMBuild.txt | 32 +
- lib/Target/R600/MCTargetDesc/AMDGPUAsmBackend.cpp | 90 +
- lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.cpp | 85 +
- lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.h | 30 +
- lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h | 60 +
- .../R600/MCTargetDesc/AMDGPUMCTargetDesc.cpp | 113 +
- lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.h | 55 +
- lib/Target/R600/MCTargetDesc/CMakeLists.txt | 10 +
- lib/Target/R600/MCTargetDesc/LLVMBuild.txt | 23 +
- lib/Target/R600/MCTargetDesc/Makefile | 16 +
- lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp | 575 ++++
- lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp | 298 ++
- lib/Target/R600/Makefile | 23 +
- lib/Target/R600/Processors.td | 29 +
- lib/Target/R600/R600Defines.h | 79 +
- lib/Target/R600/R600ExpandSpecialInstrs.cpp | 334 +++
- lib/Target/R600/R600ISelLowering.cpp | 909 ++++++
- lib/Target/R600/R600ISelLowering.h | 72 +
- lib/Target/R600/R600InstrInfo.cpp | 665 +++++
- lib/Target/R600/R600InstrInfo.h | 169 ++
- lib/Target/R600/R600Instructions.td | 1724 +++++++++++
- lib/Target/R600/R600Intrinsics.td | 32 +
- lib/Target/R600/R600MachineFunctionInfo.cpp | 34 +
- lib/Target/R600/R600MachineFunctionInfo.h | 39 +
- lib/Target/R600/R600RegisterInfo.cpp | 89 +
- lib/Target/R600/R600RegisterInfo.h | 55 +
- lib/Target/R600/R600RegisterInfo.td | 107 +
- lib/Target/R600/R600Schedule.td | 36 +
- lib/Target/R600/SIAnnotateControlFlow.cpp | 330 +++
- lib/Target/R600/SIAssignInterpRegs.cpp | 152 +
- lib/Target/R600/SIISelLowering.cpp | 512 ++++
- lib/Target/R600/SIISelLowering.h | 62 +
- lib/Target/R600/SIInstrFormats.td | 146 +
- lib/Target/R600/SIInstrInfo.cpp | 90 +
- lib/Target/R600/SIInstrInfo.h | 62 +
- lib/Target/R600/SIInstrInfo.td | 589 ++++
- lib/Target/R600/SIInstructions.td | 1351 +++++++++
- lib/Target/R600/SIIntrinsics.td | 52 +
- lib/Target/R600/SILowerControlFlow.cpp | 331 +++
- lib/Target/R600/SILowerLiteralConstants.cpp | 108 +
- lib/Target/R600/SIMachineFunctionInfo.cpp | 20 +
- lib/Target/R600/SIMachineFunctionInfo.h | 34 +
- lib/Target/R600/SIRegisterInfo.cpp | 48 +
- lib/Target/R600/SIRegisterInfo.h | 47 +
- lib/Target/R600/SIRegisterInfo.td | 167 ++
- lib/Target/R600/SISchedule.td | 15 +
- lib/Target/R600/TargetInfo/AMDGPUTargetInfo.cpp | 26 +
- lib/Target/R600/TargetInfo/CMakeLists.txt | 7 +
- lib/Target/R600/TargetInfo/LLVMBuild.txt | 23 +
- lib/Target/R600/TargetInfo/Makefile | 15 +
- test/CodeGen/R600/add.v4i32.ll | 15 +
- test/CodeGen/R600/and.v4i32.ll | 15 +
- test/CodeGen/R600/fabs.ll | 16 +
- test/CodeGen/R600/fadd.ll | 16 +
- test/CodeGen/R600/fadd.v4f32.ll | 15 +
- test/CodeGen/R600/fcmp-cnd.ll | 14 +
- test/CodeGen/R600/fcmp-cnde-int-args.ll | 16 +
- test/CodeGen/R600/fcmp.ll | 16 +
- test/CodeGen/R600/fdiv.v4f32.ll | 19 +
- test/CodeGen/R600/floor.ll | 16 +
- test/CodeGen/R600/fmax.ll | 16 +
- test/CodeGen/R600/fmin.ll | 16 +
- test/CodeGen/R600/fmul.ll | 16 +
- test/CodeGen/R600/fmul.v4f32.ll | 15 +
- test/CodeGen/R600/fsub.ll | 17 +
- test/CodeGen/R600/fsub.v4f32.ll | 15 +
- test/CodeGen/R600/i8_to_double_to_float.ll | 11 +
- test/CodeGen/R600/icmp-select-sete-reverse-args.ll | 18 +
- test/CodeGen/R600/lit.local.cfg | 13 +
- test/CodeGen/R600/literals.ll | 30 +
- test/CodeGen/R600/llvm.AMDGPU.mul.ll | 17 +
- test/CodeGen/R600/llvm.AMDGPU.trunc.ll | 16 +
- test/CodeGen/R600/llvm.cos.ll | 16 +
- test/CodeGen/R600/llvm.pow.ll | 19 +
- test/CodeGen/R600/llvm.sin.ll | 16 +
- test/CodeGen/R600/load.constant_addrspace.f32.ll | 9 +
- test/CodeGen/R600/load.i8.ll | 10 +
- test/CodeGen/R600/reciprocal.ll | 16 +
- test/CodeGen/R600/sdiv.ll | 21 +
- test/CodeGen/R600/selectcc-icmp-select-float.ll | 15 +
- test/CodeGen/R600/selectcc_cnde.ll | 11 +
- test/CodeGen/R600/selectcc_cnde_int.ll | 11 +
- test/CodeGen/R600/setcc.v4i32.ll | 12 +
- test/CodeGen/R600/short-args.ll | 37 +
- test/CodeGen/R600/store.v4f32.ll | 9 +
- test/CodeGen/R600/store.v4i32.ll | 9 +
- test/CodeGen/R600/udiv.v4i32.ll | 15 +
- test/CodeGen/R600/urem.v4i32.ll | 15 +
- test/CodeGen/R600/vec4-expand.ll | 52 +
- test/CodeGen/SI/sanity.ll | 37 +
- 149 files changed, 21461 insertions(+), 1 deletion(-)
- create mode 100644 include/llvm/IntrinsicsR600.td
- create mode 100644 lib/Target/R600/AMDGPU.h
- create mode 100644 lib/Target/R600/AMDGPU.td
- create mode 100644 lib/Target/R600/AMDGPUAsmPrinter.cpp
- create mode 100644 lib/Target/R600/AMDGPUAsmPrinter.h
- create mode 100644 lib/Target/R600/AMDGPUCodeEmitter.h
- create mode 100644 lib/Target/R600/AMDGPUConvertToISA.cpp
- create mode 100644 lib/Target/R600/AMDGPUISelLowering.cpp
- create mode 100644 lib/Target/R600/AMDGPUISelLowering.h
- create mode 100644 lib/Target/R600/AMDGPUInstrInfo.cpp
- create mode 100644 lib/Target/R600/AMDGPUInstrInfo.h
- create mode 100644 lib/Target/R600/AMDGPUInstrInfo.td
- create mode 100644 lib/Target/R600/AMDGPUInstructions.td
- create mode 100644 lib/Target/R600/AMDGPUIntrinsics.td
- create mode 100644 lib/Target/R600/AMDGPUMCInstLower.cpp
- create mode 100644 lib/Target/R600/AMDGPUMCInstLower.h
- create mode 100644 lib/Target/R600/AMDGPURegisterInfo.cpp
- create mode 100644 lib/Target/R600/AMDGPURegisterInfo.h
- create mode 100644 lib/Target/R600/AMDGPURegisterInfo.td
- create mode 100644 lib/Target/R600/AMDGPUStructurizeCFG.cpp
- create mode 100644 lib/Target/R600/AMDGPUSubtarget.cpp
- create mode 100644 lib/Target/R600/AMDGPUSubtarget.h
- create mode 100644 lib/Target/R600/AMDGPUTargetMachine.cpp
- create mode 100644 lib/Target/R600/AMDGPUTargetMachine.h
- create mode 100644 lib/Target/R600/AMDIL.h
- create mode 100644 lib/Target/R600/AMDIL7XXDevice.cpp
- create mode 100644 lib/Target/R600/AMDIL7XXDevice.h
- create mode 100644 lib/Target/R600/AMDILBase.td
- create mode 100644 lib/Target/R600/AMDILCFGStructurizer.cpp
- create mode 100644 lib/Target/R600/AMDILDevice.cpp
- create mode 100644 lib/Target/R600/AMDILDevice.h
- create mode 100644 lib/Target/R600/AMDILDeviceInfo.cpp
- create mode 100644 lib/Target/R600/AMDILDeviceInfo.h
- create mode 100644 lib/Target/R600/AMDILDevices.h
- create mode 100644 lib/Target/R600/AMDILEvergreenDevice.cpp
- create mode 100644 lib/Target/R600/AMDILEvergreenDevice.h
- create mode 100644 lib/Target/R600/AMDILFrameLowering.cpp
- create mode 100644 lib/Target/R600/AMDILFrameLowering.h
- create mode 100644 lib/Target/R600/AMDILISelDAGToDAG.cpp
- create mode 100644 lib/Target/R600/AMDILISelLowering.cpp
- create mode 100644 lib/Target/R600/AMDILInstrInfo.td
- create mode 100644 lib/Target/R600/AMDILIntrinsicInfo.cpp
- create mode 100644 lib/Target/R600/AMDILIntrinsicInfo.h
- create mode 100644 lib/Target/R600/AMDILIntrinsics.td
- create mode 100644 lib/Target/R600/AMDILNIDevice.cpp
- create mode 100644 lib/Target/R600/AMDILNIDevice.h
- create mode 100644 lib/Target/R600/AMDILPeepholeOptimizer.cpp
- create mode 100644 lib/Target/R600/AMDILRegisterInfo.td
- create mode 100644 lib/Target/R600/AMDILSIDevice.cpp
- create mode 100644 lib/Target/R600/AMDILSIDevice.h
- create mode 100644 lib/Target/R600/CMakeLists.txt
- create mode 100644 lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp
- create mode 100644 lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h
- create mode 100644 lib/Target/R600/InstPrinter/CMakeLists.txt
- create mode 100644 lib/Target/R600/InstPrinter/LLVMBuild.txt
- create mode 100644 lib/Target/R600/InstPrinter/Makefile
- create mode 100644 lib/Target/R600/LLVMBuild.txt
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUAsmBackend.cpp
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.cpp
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.h
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.cpp
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.h
- create mode 100644 lib/Target/R600/MCTargetDesc/CMakeLists.txt
- create mode 100644 lib/Target/R600/MCTargetDesc/LLVMBuild.txt
- create mode 100644 lib/Target/R600/MCTargetDesc/Makefile
- create mode 100644 lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
- create mode 100644 lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp
- create mode 100644 lib/Target/R600/Makefile
- create mode 100644 lib/Target/R600/Processors.td
- create mode 100644 lib/Target/R600/R600Defines.h
- create mode 100644 lib/Target/R600/R600ExpandSpecialInstrs.cpp
- create mode 100644 lib/Target/R600/R600ISelLowering.cpp
- create mode 100644 lib/Target/R600/R600ISelLowering.h
- create mode 100644 lib/Target/R600/R600InstrInfo.cpp
- create mode 100644 lib/Target/R600/R600InstrInfo.h
- create mode 100644 lib/Target/R600/R600Instructions.td
- create mode 100644 lib/Target/R600/R600Intrinsics.td
- create mode 100644 lib/Target/R600/R600MachineFunctionInfo.cpp
- create mode 100644 lib/Target/R600/R600MachineFunctionInfo.h
- create mode 100644 lib/Target/R600/R600RegisterInfo.cpp
- create mode 100644 lib/Target/R600/R600RegisterInfo.h
- create mode 100644 lib/Target/R600/R600RegisterInfo.td
- create mode 100644 lib/Target/R600/R600Schedule.td
- create mode 100644 lib/Target/R600/SIAnnotateControlFlow.cpp
- create mode 100644 lib/Target/R600/SIAssignInterpRegs.cpp
- create mode 100644 lib/Target/R600/SIISelLowering.cpp
- create mode 100644 lib/Target/R600/SIISelLowering.h
- create mode 100644 lib/Target/R600/SIInstrFormats.td
- create mode 100644 lib/Target/R600/SIInstrInfo.cpp
- create mode 100644 lib/Target/R600/SIInstrInfo.h
- create mode 100644 lib/Target/R600/SIInstrInfo.td
- create mode 100644 lib/Target/R600/SIInstructions.td
- create mode 100644 lib/Target/R600/SIIntrinsics.td
- create mode 100644 lib/Target/R600/SILowerControlFlow.cpp
- create mode 100644 lib/Target/R600/SILowerLiteralConstants.cpp
- create mode 100644 lib/Target/R600/SIMachineFunctionInfo.cpp
- create mode 100644 lib/Target/R600/SIMachineFunctionInfo.h
- create mode 100644 lib/Target/R600/SIRegisterInfo.cpp
- create mode 100644 lib/Target/R600/SIRegisterInfo.h
- create mode 100644 lib/Target/R600/SIRegisterInfo.td
- create mode 100644 lib/Target/R600/SISchedule.td
- create mode 100644 lib/Target/R600/TargetInfo/AMDGPUTargetInfo.cpp
- create mode 100644 lib/Target/R600/TargetInfo/CMakeLists.txt
- create mode 100644 lib/Target/R600/TargetInfo/LLVMBuild.txt
- create mode 100644 lib/Target/R600/TargetInfo/Makefile
- create mode 100644 test/CodeGen/R600/add.v4i32.ll
- create mode 100644 test/CodeGen/R600/and.v4i32.ll
- create mode 100644 test/CodeGen/R600/fabs.ll
- create mode 100644 test/CodeGen/R600/fadd.ll
- create mode 100644 test/CodeGen/R600/fadd.v4f32.ll
- create mode 100644 test/CodeGen/R600/fcmp-cnd.ll
- create mode 100644 test/CodeGen/R600/fcmp-cnde-int-args.ll
- create mode 100644 test/CodeGen/R600/fcmp.ll
- create mode 100644 test/CodeGen/R600/fdiv.v4f32.ll
- create mode 100644 test/CodeGen/R600/floor.ll
- create mode 100644 test/CodeGen/R600/fmax.ll
- create mode 100644 test/CodeGen/R600/fmin.ll
- create mode 100644 test/CodeGen/R600/fmul.ll
- create mode 100644 test/CodeGen/R600/fmul.v4f32.ll
- create mode 100644 test/CodeGen/R600/fsub.ll
- create mode 100644 test/CodeGen/R600/fsub.v4f32.ll
- create mode 100644 test/CodeGen/R600/i8_to_double_to_float.ll
- create mode 100644 test/CodeGen/R600/icmp-select-sete-reverse-args.ll
- create mode 100644 test/CodeGen/R600/lit.local.cfg
- create mode 100644 test/CodeGen/R600/literals.ll
- create mode 100644 test/CodeGen/R600/llvm.AMDGPU.mul.ll
- create mode 100644 test/CodeGen/R600/llvm.AMDGPU.trunc.ll
- create mode 100644 test/CodeGen/R600/llvm.cos.ll
- create mode 100644 test/CodeGen/R600/llvm.pow.ll
- create mode 100644 test/CodeGen/R600/llvm.sin.ll
- create mode 100644 test/CodeGen/R600/load.constant_addrspace.f32.ll
- create mode 100644 test/CodeGen/R600/load.i8.ll
- create mode 100644 test/CodeGen/R600/reciprocal.ll
- create mode 100644 test/CodeGen/R600/sdiv.ll
- create mode 100644 test/CodeGen/R600/selectcc-icmp-select-float.ll
- create mode 100644 test/CodeGen/R600/selectcc_cnde.ll
- create mode 100644 test/CodeGen/R600/selectcc_cnde_int.ll
- create mode 100644 test/CodeGen/R600/setcc.v4i32.ll
- create mode 100644 test/CodeGen/R600/short-args.ll
- create mode 100644 test/CodeGen/R600/store.v4f32.ll
- create mode 100644 test/CodeGen/R600/store.v4i32.ll
- create mode 100644 test/CodeGen/R600/udiv.v4i32.ll
- create mode 100644 test/CodeGen/R600/urem.v4i32.ll
- create mode 100644 test/CodeGen/R600/vec4-expand.ll
- create mode 100644 test/CodeGen/SI/sanity.ll
-
-diff --git a/CODE_OWNERS.TXT b/CODE_OWNERS.TXT
-index fd7bcda..90285be 100644
---- a/CODE_OWNERS.TXT
-+++ b/CODE_OWNERS.TXT
-@@ -49,3 +49,17 @@ D: Register allocators and TableGen
- N: Duncan Sands
- E: baldrick@free.fr
- D: DragonEgg
-+
-+N: Tom Stellard
-+E: thomas.stellard@amd.com
-+E: mesa-dev@lists.freedesktop.org
-+D: R600 Backend
-+
-+N: Andrew Trick
-+E: atrick@apple.com
-+D: IndVar Simplify, Loop Strength Reduction, Instruction Scheduling
-+
-+N: Bill Wendling
-+E: wendling@apple.com
-+D: libLTO & IR Linker
-+
-diff --git a/include/llvm/Intrinsics.td b/include/llvm/Intrinsics.td
-index 2e1597f..059bd80 100644
---- a/include/llvm/Intrinsics.td
-+++ b/include/llvm/Intrinsics.td
-@@ -469,3 +469,4 @@ include "llvm/IntrinsicsXCore.td"
- include "llvm/IntrinsicsHexagon.td"
- include "llvm/IntrinsicsNVVM.td"
- include "llvm/IntrinsicsMips.td"
-+include "llvm/IntrinsicsR600.td"
-diff --git a/include/llvm/IntrinsicsR600.td b/include/llvm/IntrinsicsR600.td
-new file mode 100644
-index 0000000..ecb5668
---- /dev/null
-+++ b/include/llvm/IntrinsicsR600.td
-@@ -0,0 +1,36 @@
-+//===- IntrinsicsR600.td - Defines R600 intrinsics ---------*- tablegen -*-===//
-+//
-+// The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//===----------------------------------------------------------------------===//
-+//
-+// This file defines all of the R600-specific intrinsics.
-+//
-+//===----------------------------------------------------------------------===//
-+
-+let TargetPrefix = "r600" in {
-+
-+class R600ReadPreloadRegisterIntrinsic<string name>
-+ : Intrinsic<[llvm_i32_ty], [], [IntrNoMem]>,
-+ GCCBuiltin<name>;
-+
-+multiclass R600ReadPreloadRegisterIntrinsic_xyz<string prefix> {
-+ def _x : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_x")>;
-+ def _y : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_y")>;
-+ def _z : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_z")>;
-+}
-+
-+defm int_r600_read_global_size : R600ReadPreloadRegisterIntrinsic_xyz <
-+ "__builtin_r600_read_global_size">;
-+defm int_r600_read_local_size : R600ReadPreloadRegisterIntrinsic_xyz <
-+ "__builtin_r600_read_local_size">;
-+defm int_r600_read_ngroups : R600ReadPreloadRegisterIntrinsic_xyz <
-+ "__builtin_r600_read_ngroups">;
-+defm int_r600_read_tgid : R600ReadPreloadRegisterIntrinsic_xyz <
-+ "__builtin_r600_read_tgid">;
-+defm int_r600_read_tidig : R600ReadPreloadRegisterIntrinsic_xyz <
-+ "__builtin_r600_read_tidig">;
-+} // End TargetPrefix = "r600"
+diff --git a/autoconf/configure.ac b/autoconf/configure.ac
+index 7715531..1330c36 100644
+--- a/autoconf/configure.ac
++++ b/autoconf/configure.ac
+@@ -751,6 +751,11 @@ AC_ARG_ENABLE([experimental-targets],AS_HELP_STRING([--enable-experimental-targe
+
+ if test ${enableval} != "disable"
+ then
++ if test ${enableval} = "AMDGPU"
++ then
++ AC_MSG_ERROR([The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600])
++ enableval="R600"
++ fi
+ TARGETS_TO_BUILD="$enableval $TARGETS_TO_BUILD"
+ fi
+
+diff --git a/configure b/configure
+index 4fa0705..02012b9 100755
+--- a/configure
++++ b/configure
+@@ -5473,6 +5473,13 @@ fi
+
+ if test ${enableval} != "disable"
+ then
++ if test ${enableval} = "AMDGPU"
++ then
++ { { echo "$as_me:$LINENO: error: The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600" >&5
++echo "$as_me: error: The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600" >&2;}
++ { (exit 1); exit 1; }; }
++ enableval="R600"
++ fi
+ TARGETS_TO_BUILD="$enableval $TARGETS_TO_BUILD"
+ fi
+
+@@ -10316,7 +10323,7 @@ else
+ lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
+ lt_status=$lt_dlunknown
+ cat > conftest.$ac_ext <<EOF
+-#line 10317 "configure"
++#line 10326 "configure"
+ #include "confdefs.h"
+
+ #if HAVE_DLFCN_H
diff --git a/lib/Target/LLVMBuild.txt b/lib/Target/LLVMBuild.txt
index 8995080..84c4111 100644
--- a/lib/Target/LLVMBuild.txt
@@ -527,10 +56,10 @@ index 8995080..84c4111 100644
; with the best execution engine (the native JIT, if available, or the
diff --git a/lib/Target/R600/AMDGPU.h b/lib/Target/R600/AMDGPU.h
new file mode 100644
-index 0000000..0f5125d
+index 0000000..ba87918
--- /dev/null
+++ b/lib/Target/R600/AMDGPU.h
-@@ -0,0 +1,49 @@
+@@ -0,0 +1,51 @@
+//===-- AMDGPU.h - MachineFunction passes hw codegen --------------*- C++ -*-=//
+//
+// The LLVM Compiler Infrastructure
@@ -556,17 +85,19 @@ index 0000000..0f5125d
+// R600 Passes
+FunctionPass* createR600KernelParametersPass(const DataLayout *TD);
+FunctionPass *createR600ExpandSpecialInstrsPass(TargetMachine &tm);
++FunctionPass *createR600LowerConstCopy(TargetMachine &tm);
+
+// SI Passes
+FunctionPass *createSIAnnotateControlFlowPass();
+FunctionPass *createSIAssignInterpRegsPass(TargetMachine &tm);
+FunctionPass *createSILowerControlFlowPass(TargetMachine &tm);
+FunctionPass *createSICodeEmitterPass(formatted_raw_ostream &OS);
-+FunctionPass *createSILowerLiteralConstantsPass(TargetMachine &tm);
++FunctionPass *createSIInsertWaits(TargetMachine &tm);
+
+// Passes common to R600 and SI
+Pass *createAMDGPUStructurizeCFGPass();
+FunctionPass *createAMDGPUConvertToISAPass(TargetMachine &tm);
++FunctionPass* createAMDGPUIndirectAddressingPass(TargetMachine &tm);
+
+} // End namespace llvm
+
@@ -628,10 +159,10 @@ index 0000000..40f4741
+include "AMDGPUInstructions.td"
diff --git a/lib/Target/R600/AMDGPUAsmPrinter.cpp b/lib/Target/R600/AMDGPUAsmPrinter.cpp
new file mode 100644
-index 0000000..4553c45
+index 0000000..254e62e
--- /dev/null
+++ b/lib/Target/R600/AMDGPUAsmPrinter.cpp
-@@ -0,0 +1,138 @@
+@@ -0,0 +1,145 @@
+//===-- AMDGPUAsmPrinter.cpp - AMDGPU Assebly printer --------------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -681,6 +212,9 @@ index 0000000..4553c45
+#endif
+ }
+ SetupMachineFunction(MF);
++ if (OutStreamer.hasRawTextSupport()) {
++ OutStreamer.EmitRawText("@" + MF.getName() + ":");
++ }
+ OutStreamer.SwitchSection(getObjFileLowering().getTextSection());
+ if (STM.device()->getGeneration() > AMDGPUDeviceInfo::HD6XXX) {
+ EmitProgramInfo(MF);
@@ -722,8 +256,6 @@ index 0000000..4553c45
+ switch (reg) {
+ default: break;
+ case AMDGPU::EXEC:
-+ case AMDGPU::SI_LITERAL_CONSTANT:
-+ case AMDGPU::SREG_LIT_0:
+ case AMDGPU::M0:
+ continue;
+ }
@@ -749,10 +281,16 @@ index 0000000..4553c45
+ } else if (AMDGPU::SReg_256RegClass.contains(reg)) {
+ isSGPR = true;
+ width = 8;
++ } else if (AMDGPU::VReg_256RegClass.contains(reg)) {
++ isSGPR = false;
++ width = 8;
++ } else if (AMDGPU::VReg_512RegClass.contains(reg)) {
++ isSGPR = false;
++ width = 16;
+ } else {
+ assert(!"Unknown register class");
+ }
-+ hwReg = RI->getEncodingValue(reg);
++ hwReg = RI->getEncodingValue(reg) & 0xff;
+ maxUsed = hwReg + width - 1;
+ if (isSGPR) {
+ MaxSGPR = maxUsed > MaxSGPR ? maxUsed : MaxSGPR;
@@ -820,61 +358,6 @@ index 0000000..3812282
+} // End anonymous llvm
+
+#endif //AMDGPU_ASMPRINTER_H
-diff --git a/lib/Target/R600/AMDGPUCodeEmitter.h b/lib/Target/R600/AMDGPUCodeEmitter.h
-new file mode 100644
-index 0000000..84f3588
---- /dev/null
-+++ b/lib/Target/R600/AMDGPUCodeEmitter.h
-@@ -0,0 +1,49 @@
-+//===-- AMDGPUCodeEmitter.h - AMDGPU Code Emitter interface -----------------===//
-+//
-+// The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//===----------------------------------------------------------------------===//
-+//
-+/// \file
-+/// \brief CodeEmitter interface for R600 and SI codegen.
-+//
-+//===----------------------------------------------------------------------===//
-+
-+#ifndef AMDGPUCODEEMITTER_H
-+#define AMDGPUCODEEMITTER_H
-+
-+namespace llvm {
-+
-+class AMDGPUCodeEmitter {
-+public:
-+ uint64_t getBinaryCodeForInstr(const MachineInstr &MI) const;
-+ virtual uint64_t getMachineOpValue(const MachineInstr &MI,
-+ const MachineOperand &MO) const { return 0; }
-+ virtual unsigned GPR4AlignEncode(const MachineInstr &MI,
-+ unsigned OpNo) const {
-+ return 0;
-+ }
-+ virtual unsigned GPR2AlignEncode(const MachineInstr &MI,
-+ unsigned OpNo) const {
-+ return 0;
-+ }
-+ virtual uint64_t VOPPostEncode(const MachineInstr &MI,
-+ uint64_t Value) const {
-+ return Value;
-+ }
-+ virtual uint64_t i32LiteralEncode(const MachineInstr &MI,
-+ unsigned OpNo) const {
-+ return 0;
-+ }
-+ virtual uint32_t SMRDmemriEncode(const MachineInstr &MI, unsigned OpNo)
-+ const {
-+ return 0;
-+ }
-+};
-+
-+} // End namespace llvm
-+
-+#endif // AMDGPUCODEEMITTER_H
diff --git a/lib/Target/R600/AMDGPUConvertToISA.cpp b/lib/Target/R600/AMDGPUConvertToISA.cpp
new file mode 100644
index 0000000..50297d1
@@ -943,12 +426,190 @@ index 0000000..50297d1
+ }
+ return false;
+}
+diff --git a/lib/Target/R600/AMDGPUFrameLowering.cpp b/lib/Target/R600/AMDGPUFrameLowering.cpp
+new file mode 100644
+index 0000000..a3b6936
+--- /dev/null
++++ b/lib/Target/R600/AMDGPUFrameLowering.cpp
+@@ -0,0 +1,122 @@
++//===----------------------- AMDGPUFrameLowering.cpp ----------------------===//
++//
++// The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//==-----------------------------------------------------------------------===//
++//
++// Interface to describe a layout of a stack frame on a AMDIL target machine
++//
++//===----------------------------------------------------------------------===//
++#include "AMDGPUFrameLowering.h"
++#include "AMDGPURegisterInfo.h"
++#include "R600MachineFunctionInfo.h"
++#include "llvm/CodeGen/MachineFrameInfo.h"
++#include "llvm/CodeGen/MachineRegisterInfo.h"
++#include "llvm/Instructions.h"
++
++using namespace llvm;
++AMDGPUFrameLowering::AMDGPUFrameLowering(StackDirection D, unsigned StackAl,
++ int LAO, unsigned TransAl)
++ : TargetFrameLowering(D, StackAl, LAO, TransAl) { }
++
++AMDGPUFrameLowering::~AMDGPUFrameLowering() { }
++
++unsigned AMDGPUFrameLowering::getStackWidth(const MachineFunction &MF) const {
++
++ // XXX: Hardcoding to 1 for now.
++ //
++ // I think the StackWidth should stored as metadata associated with the
++ // MachineFunction. This metadata can either be added by a frontend, or
++ // calculated by a R600 specific LLVM IR pass.
++ //
++ // The StackWidth determines how stack objects are laid out in memory.
++ // For a vector stack variable, like: int4 stack[2], the data will be stored
++ // in the following ways depending on the StackWidth.
++ //
++ // StackWidth = 1:
++ //
++ // T0.X = stack[0].x
++ // T1.X = stack[0].y
++ // T2.X = stack[0].z
++ // T3.X = stack[0].w
++ // T4.X = stack[1].x
++ // T5.X = stack[1].y
++ // T6.X = stack[1].z
++ // T7.X = stack[1].w
++ //
++ // StackWidth = 2:
++ //
++ // T0.X = stack[0].x
++ // T0.Y = stack[0].y
++ // T1.X = stack[0].z
++ // T1.Y = stack[0].w
++ // T2.X = stack[1].x
++ // T2.Y = stack[1].y
++ // T3.X = stack[1].z
++ // T3.Y = stack[1].w
++ //
++ // StackWidth = 4:
++ // T0.X = stack[0].x
++ // T0.Y = stack[0].y
++ // T0.Z = stack[0].z
++ // T0.W = stack[0].w
++ // T1.X = stack[1].x
++ // T1.Y = stack[1].y
++ // T1.Z = stack[1].z
++ // T1.W = stack[1].w
++ return 1;
++}
++
++/// \returns The number of registers allocated for \p FI.
++int AMDGPUFrameLowering::getFrameIndexOffset(const MachineFunction &MF,
++ int FI) const {
++ const MachineFrameInfo *MFI = MF.getFrameInfo();
++ unsigned Offset = 0;
++ int UpperBound = FI == -1 ? MFI->getNumObjects() : FI;
++
++ for (int i = MFI->getObjectIndexBegin(); i < UpperBound; ++i) {
++ const AllocaInst *Alloca = MFI->getObjectAllocation(i);
++ unsigned ArrayElements;
++ const Type *AllocaType = Alloca->getAllocatedType();
++ const Type *ElementType;
++
++ if (AllocaType->isArrayTy()) {
++ ArrayElements = AllocaType->getArrayNumElements();
++ ElementType = AllocaType->getArrayElementType();
++ } else {
++ ArrayElements = 1;
++ ElementType = AllocaType;
++ }
++
++ unsigned VectorElements;
++ if (ElementType->isVectorTy()) {
++ VectorElements = ElementType->getVectorNumElements();
++ } else {
++ VectorElements = 1;
++ }
++
++ Offset += (VectorElements / getStackWidth(MF)) * ArrayElements;
++ }
++ return Offset;
++}
++
++const TargetFrameLowering::SpillSlot *
++AMDGPUFrameLowering::getCalleeSavedSpillSlots(unsigned &NumEntries) const {
++ NumEntries = 0;
++ return 0;
++}
++void
++AMDGPUFrameLowering::emitPrologue(MachineFunction &MF) const {
++}
++void
++AMDGPUFrameLowering::emitEpilogue(MachineFunction &MF,
++ MachineBasicBlock &MBB) const {
++}
++
++bool
++AMDGPUFrameLowering::hasFP(const MachineFunction &MF) const {
++ return false;
++}
+diff --git a/lib/Target/R600/AMDGPUFrameLowering.h b/lib/Target/R600/AMDGPUFrameLowering.h
+new file mode 100644
+index 0000000..cf5742e
+--- /dev/null
++++ b/lib/Target/R600/AMDGPUFrameLowering.h
+@@ -0,0 +1,44 @@
++//===--------------------- AMDGPUFrameLowering.h ----------------*- C++ -*-===//
++//
++// The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++//
++/// \file
++/// \brief Interface to describe a layout of a stack frame on a AMDIL target
++/// machine.
++//
++//===----------------------------------------------------------------------===//
++#ifndef AMDILFRAME_LOWERING_H
++#define AMDILFRAME_LOWERING_H
++
++#include "llvm/CodeGen/MachineFunction.h"
++#include "llvm/Target/TargetFrameLowering.h"
++
++namespace llvm {
++
++/// \brief Information about the stack frame layout on the AMDGPU targets.
++///
++/// It holds the direction of the stack growth, the known stack alignment on
++/// entry to each function, and the offset to the locals area.
++/// See TargetFrameInfo for more comments.
++class AMDGPUFrameLowering : public TargetFrameLowering {
++public:
++ AMDGPUFrameLowering(StackDirection D, unsigned StackAl, int LAO,
++ unsigned TransAl = 1);
++ virtual ~AMDGPUFrameLowering();
++
++ /// \returns The number of 32-bit sub-registers that are used when storing
++ /// values to the stack.
++ virtual unsigned getStackWidth(const MachineFunction &MF) const;
++ virtual int getFrameIndexOffset(const MachineFunction &MF, int FI) const;
++ virtual const SpillSlot *getCalleeSavedSpillSlots(unsigned &NumEntries) const;
++ virtual void emitPrologue(MachineFunction &MF) const;
++ virtual void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const;
++ virtual bool hasFP(const MachineFunction &MF) const;
++};
++} // namespace llvm
++#endif // AMDILFRAME_LOWERING_H
diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
new file mode 100644
-index 0000000..473dac4
+index 0000000..d0d23d6
--- /dev/null
+++ b/lib/Target/R600/AMDGPUISelLowering.cpp
-@@ -0,0 +1,417 @@
+@@ -0,0 +1,418 @@
+//===-- AMDGPUISelLowering.cpp - AMDGPU Common DAG lowering functions -----===//
+//
+// The LLVM Compiler Infrastructure
@@ -1361,17 +1022,18 @@ index 0000000..473dac4
+ NODE_NAME_CASE(SMIN)
+ NODE_NAME_CASE(UMIN)
+ NODE_NAME_CASE(URECIP)
-+ NODE_NAME_CASE(INTERP)
-+ NODE_NAME_CASE(INTERP_P0)
+ NODE_NAME_CASE(EXPORT)
++ NODE_NAME_CASE(CONST_ADDRESS)
++ NODE_NAME_CASE(REGISTER_LOAD)
++ NODE_NAME_CASE(REGISTER_STORE)
+ }
+}
diff --git a/lib/Target/R600/AMDGPUISelLowering.h b/lib/Target/R600/AMDGPUISelLowering.h
new file mode 100644
-index 0000000..c7abaf6
+index 0000000..99a11ff
--- /dev/null
+++ b/lib/Target/R600/AMDGPUISelLowering.h
-@@ -0,0 +1,144 @@
+@@ -0,0 +1,140 @@
+//===-- AMDGPUISelLowering.h - AMDGPU Lowering Interface --------*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -1427,6 +1089,11 @@ index 0000000..c7abaf6
+ const SmallVectorImpl<ISD::OutputArg> &Outs,
+ const SmallVectorImpl<SDValue> &OutVals,
+ DebugLoc DL, SelectionDAG &DAG) const;
++ virtual SDValue LowerCall(CallLoweringInfo &CLI,
++ SmallVectorImpl<SDValue> &InVals) const {
++ CLI.Callee.dump();
++ llvm_unreachable("Undefined function");
++ }
+
+ virtual SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerIntrinsicIABS(SDValue Op, SelectionDAG &DAG) const;
@@ -1494,35 +1161,26 @@ index 0000000..c7abaf6
+ SMIN,
+ UMIN,
+ URECIP,
-+ INTERP,
-+ INTERP_P0,
+ EXPORT,
++ CONST_ADDRESS,
++ REGISTER_LOAD,
++ REGISTER_STORE,
+ LAST_AMDGPU_ISD_NUMBER
+};
+
+
+} // End namespace AMDGPUISD
+
-+namespace SIISD {
-+
-+enum {
-+ SI_FIRST = AMDGPUISD::LAST_AMDGPU_ISD_NUMBER,
-+ VCC_AND,
-+ VCC_BITCAST
-+};
-+
-+} // End namespace SIISD
-+
+} // End namespace llvm
+
+#endif // AMDGPUISELLOWERING_H
-diff --git a/lib/Target/R600/AMDGPUInstrInfo.cpp b/lib/Target/R600/AMDGPUInstrInfo.cpp
+diff --git a/lib/Target/R600/AMDGPUIndirectAddressing.cpp b/lib/Target/R600/AMDGPUIndirectAddressing.cpp
new file mode 100644
-index 0000000..e42a46d
+index 0000000..15840b3
--- /dev/null
-+++ b/lib/Target/R600/AMDGPUInstrInfo.cpp
-@@ -0,0 +1,257 @@
-+//===-- AMDGPUInstrInfo.cpp - Base class for AMD GPU InstrInfo ------------===//
++++ b/lib/Target/R600/AMDGPUIndirectAddressing.cpp
+@@ -0,0 +1,344 @@
++//===-- AMDGPUIndirectAddressing.cpp - Indirect Adressing Support ---------===//
+//
+// The LLVM Compiler Infrastructure
+//
@@ -1532,60 +1190,410 @@ index 0000000..e42a46d
+//===----------------------------------------------------------------------===//
+//
+/// \file
-+/// \brief Implementation of the TargetInstrInfo class that is common to all
-+/// AMD GPUs.
++///
++/// Instructions can use indirect addressing to index the register file as if it
++/// were memory. This pass lowers RegisterLoad and RegisterStore instructions
++/// to either a COPY or a MOV that uses indirect addressing.
+//
+//===----------------------------------------------------------------------===//
+
-+#include "AMDGPUInstrInfo.h"
-+#include "AMDGPURegisterInfo.h"
-+#include "AMDGPUTargetMachine.h"
-+#include "AMDIL.h"
-+#include "llvm/CodeGen/MachineFrameInfo.h"
++#include "AMDGPU.h"
++#include "R600InstrInfo.h"
++#include "R600MachineFunctionInfo.h"
++#include "llvm/CodeGen/MachineFunction.h"
++#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
-+
-+#define GET_INSTRINFO_CTOR
-+#include "AMDGPUGenInstrInfo.inc"
++#include "llvm/Support/Debug.h"
+
+using namespace llvm;
+
-+AMDGPUInstrInfo::AMDGPUInstrInfo(TargetMachine &tm)
-+ : AMDGPUGenInstrInfo(0,0), RI(tm, *this), TM(tm) { }
++namespace {
+
-+const AMDGPURegisterInfo &AMDGPUInstrInfo::getRegisterInfo() const {
-+ return RI;
-+}
++class AMDGPUIndirectAddressingPass : public MachineFunctionPass {
+
-+bool AMDGPUInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,
-+ unsigned &SrcReg, unsigned &DstReg,
-+ unsigned &SubIdx) const {
-+// TODO: Implement this function
-+ return false;
-+}
++private:
++ static char ID;
++ const AMDGPUInstrInfo *TII;
+
-+unsigned AMDGPUInstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
-+ int &FrameIndex) const {
-+// TODO: Implement this function
-+ return 0;
-+}
++ bool regHasExplicitDef(MachineRegisterInfo &MRI, unsigned Reg) const;
+
-+unsigned AMDGPUInstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI,
-+ int &FrameIndex) const {
-+// TODO: Implement this function
-+ return 0;
-+}
++public:
++ AMDGPUIndirectAddressingPass(TargetMachine &tm) :
++ MachineFunctionPass(ID),
++ TII(static_cast<const AMDGPUInstrInfo*>(tm.getInstrInfo()))
++ { }
+
-+bool AMDGPUInstrInfo::hasLoadFromStackSlot(const MachineInstr *MI,
-+ const MachineMemOperand *&MMO,
-+ int &FrameIndex) const {
-+// TODO: Implement this function
-+ return false;
++ virtual bool runOnMachineFunction(MachineFunction &MF);
++
++ const char *getPassName() const { return "R600 Handle indirect addressing"; }
++
++};
++
++} // End anonymous namespace
++
++char AMDGPUIndirectAddressingPass::ID = 0;
++
++FunctionPass *llvm::createAMDGPUIndirectAddressingPass(TargetMachine &tm) {
++ return new AMDGPUIndirectAddressingPass(tm);
+}
-+unsigned AMDGPUInstrInfo::isStoreFromStackSlot(const MachineInstr *MI,
-+ int &FrameIndex) const {
-+// TODO: Implement this function
-+ return 0;
++
++bool AMDGPUIndirectAddressingPass::runOnMachineFunction(MachineFunction &MF) {
++ MachineRegisterInfo &MRI = MF.getRegInfo();
++
++ int IndirectBegin = TII->getIndirectIndexBegin(MF);
++ int IndirectEnd = TII->getIndirectIndexEnd(MF);
++
++ if (IndirectBegin == -1) {
++ // No indirect addressing, we can skip this pass
++ assert(IndirectEnd == -1);
++ return false;
++ }
++
++ // The map keeps track of the indirect address that is represented by
++ // each virtual register. The key is the register and the value is the
++ // indirect address it uses.
++ std::map<unsigned, unsigned> RegisterAddressMap;
++
++ // First pass - Lower all of the RegisterStore instructions and track which
++ // registers are live.
++ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
++ BB != BB_E; ++BB) {
++ // This map keeps track of the current live indirect registers.
++ // The key is the address and the value is the register
++ std::map<unsigned, unsigned> LiveAddressRegisterMap;
++ MachineBasicBlock &MBB = *BB;
++
++ for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I);
++ I != MBB.end(); I = Next) {
++ Next = llvm::next(I);
++ MachineInstr &MI = *I;
++
++ if (!TII->isRegisterStore(MI)) {
++ continue;
++ }
++
++ // Lower RegisterStore
++
++ unsigned RegIndex = MI.getOperand(2).getImm();
++ unsigned Channel = MI.getOperand(3).getImm();
++ unsigned Address = TII->calculateIndirectAddress(RegIndex, Channel);
++ const TargetRegisterClass *IndirectStoreRegClass =
++ TII->getIndirectAddrStoreRegClass(MI.getOperand(0).getReg());
++
++ if (MI.getOperand(1).getReg() == AMDGPU::INDIRECT_BASE_ADDR) {
++ // Direct register access.
++ unsigned DstReg = MRI.createVirtualRegister(IndirectStoreRegClass);
++
++ BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY), DstReg)
++ .addOperand(MI.getOperand(0));
++
++ RegisterAddressMap[DstReg] = Address;
++ LiveAddressRegisterMap[Address] = DstReg;
++ } else {
++ // Indirect register access.
++ MachineInstrBuilder MOV = TII->buildIndirectWrite(BB, I,
++ MI.getOperand(0).getReg(), // Value
++ Address,
++ MI.getOperand(1).getReg()); // Offset
++ for (int i = IndirectBegin; i <= IndirectEnd; ++i) {
++ unsigned Addr = TII->calculateIndirectAddress(i, Channel);
++ unsigned DstReg = MRI.createVirtualRegister(IndirectStoreRegClass);
++ MOV.addReg(DstReg, RegState::Define | RegState::Implicit);
++ RegisterAddressMap[DstReg] = Addr;
++ LiveAddressRegisterMap[Addr] = DstReg;
++ }
++ }
++ MI.eraseFromParent();
++ }
++
++ // Update the live-ins of the succesor blocks
++ for (MachineBasicBlock::succ_iterator Succ = MBB.succ_begin(),
++ SuccEnd = MBB.succ_end();
++ SuccEnd != Succ; ++Succ) {
++ std::map<unsigned, unsigned>::const_iterator Key, KeyEnd;
++ for (Key = LiveAddressRegisterMap.begin(),
++ KeyEnd = LiveAddressRegisterMap.end(); KeyEnd != Key; ++Key) {
++ (*Succ)->addLiveIn(Key->second);
++ }
++ }
++ }
++
++ // Second pass - Lower the RegisterLoad instructions
++ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
++ BB != BB_E; ++BB) {
++ // Key is the address and the value is the register
++ std::map<unsigned, unsigned> LiveAddressRegisterMap;
++ MachineBasicBlock &MBB = *BB;
++
++ MachineBasicBlock::livein_iterator LI = MBB.livein_begin();
++ while (LI != MBB.livein_end()) {
++ std::vector<unsigned> PhiRegisters;
++
++ // Make sure this live in is used for indirect addressing
++ if (RegisterAddressMap.find(*LI) == RegisterAddressMap.end()) {
++ ++LI;
++ continue;
++ }
++
++ unsigned Address = RegisterAddressMap[*LI];
++ LiveAddressRegisterMap[Address] = *LI;
++ PhiRegisters.push_back(*LI);
++
++ // Check if there are other live in registers which map to the same
++ // indirect address.
++ for (MachineBasicBlock::livein_iterator LJ = llvm::next(LI),
++ LE = MBB.livein_end();
++ LJ != LE; ++LJ) {
++ unsigned Reg = *LJ;
++ if (RegisterAddressMap.find(Reg) == RegisterAddressMap.end()) {
++ continue;
++ }
++
++ if (RegisterAddressMap[Reg] == Address) {
++ PhiRegisters.push_back(Reg);
++ }
++ }
++
++ if (PhiRegisters.size() == 1) {
++ // We don't need to insert a Phi instruction, so we can just add the
++ // registers to the live list for the block.
++ LiveAddressRegisterMap[Address] = *LI;
++ MBB.removeLiveIn(*LI);
++ } else {
++ // We need to insert a PHI, because we have the same address being
++ // written in multiple predecessor blocks.
++ const TargetRegisterClass *PhiDstClass =
++ TII->getIndirectAddrStoreRegClass(*(PhiRegisters.begin()));
++ unsigned PhiDstReg = MRI.createVirtualRegister(PhiDstClass);
++ MachineInstrBuilder Phi = BuildMI(MBB, MBB.begin(),
++ MBB.findDebugLoc(MBB.begin()),
++ TII->get(AMDGPU::PHI), PhiDstReg);
++
++ for (std::vector<unsigned>::const_iterator RI = PhiRegisters.begin(),
++ RE = PhiRegisters.end();
++ RI != RE; ++RI) {
++ unsigned Reg = *RI;
++ MachineInstr *DefInst = MRI.getVRegDef(Reg);
++ assert(DefInst);
++ MachineBasicBlock *RegBlock = DefInst->getParent();
++ Phi.addReg(Reg);
++ Phi.addMBB(RegBlock);
++ MBB.removeLiveIn(Reg);
++ }
++ RegisterAddressMap[PhiDstReg] = Address;
++ LiveAddressRegisterMap[Address] = PhiDstReg;
++ }
++ LI = MBB.livein_begin();
++ }
++
++ for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I);
++ I != MBB.end(); I = Next) {
++ Next = llvm::next(I);
++ MachineInstr &MI = *I;
++
++ if (!TII->isRegisterLoad(MI)) {
++ if (MI.getOpcode() == AMDGPU::PHI) {
++ continue;
++ }
++ // Check for indirect register defs
++ for (unsigned OpIdx = 0, NumOperands = MI.getNumOperands();
++ OpIdx < NumOperands; ++OpIdx) {
++ MachineOperand &MO = MI.getOperand(OpIdx);
++ if (MO.isReg() && MO.isDef() &&
++ RegisterAddressMap.find(MO.getReg()) != RegisterAddressMap.end()) {
++ unsigned Reg = MO.getReg();
++ unsigned LiveAddress = RegisterAddressMap[Reg];
++ // Chain the live-ins
++ if (LiveAddressRegisterMap.find(LiveAddress) !=
++ RegisterAddressMap.end()) {
++ MI.addOperand(MachineOperand::CreateReg(
++ LiveAddressRegisterMap[LiveAddress],
++ false, // isDef
++ true, // isImp
++ true)); // isKill
++ }
++ LiveAddressRegisterMap[LiveAddress] = Reg;
++ }
++ }
++ continue;
++ }
++
++ const TargetRegisterClass *SuperIndirectRegClass =
++ TII->getSuperIndirectRegClass();
++ const TargetRegisterClass *IndirectLoadRegClass =
++ TII->getIndirectAddrLoadRegClass();
++ unsigned IndirectReg = MRI.createVirtualRegister(SuperIndirectRegClass);
++
++ unsigned RegIndex = MI.getOperand(2).getImm();
++ unsigned Channel = MI.getOperand(3).getImm();
++ unsigned Address = TII->calculateIndirectAddress(RegIndex, Channel);
++
++ if (MI.getOperand(1).getReg() == AMDGPU::INDIRECT_BASE_ADDR) {
++ // Direct register access
++ unsigned Reg = LiveAddressRegisterMap[Address];
++ unsigned AddrReg = IndirectLoadRegClass->getRegister(Address);
++
++ if (regHasExplicitDef(MRI, Reg)) {
++ // If the register we are reading from has an explicit def, then that
++ // means it was written via a direct register access (i.e. COPY
++ // or other instruction that doesn't use indirect addressing). In
++ // this case we know where the value has been stored, so we can just
++ // issue a copy.
++ BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY),
++ MI.getOperand(0).getReg())
++ .addReg(Reg);
++ } else {
++ // If the register we are reading has an implicit def, then that
++ // means it was written by an indirect register access (i.e. An
++ // instruction that uses indirect addressing.
++ BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY),
++ MI.getOperand(0).getReg())
++ .addReg(AddrReg)
++ .addReg(Reg, RegState::Implicit);
++ }
++ } else {
++ // Indirect register access
++
++ // Note on REQ_SEQUENCE instructons: You can't actually use the register
++ // it defines unless you have an instruction that takes the defined
++ // register class as an operand.
++
++ MachineInstrBuilder Sequence = BuildMI(MBB, I, MBB.findDebugLoc(I),
++ TII->get(AMDGPU::REG_SEQUENCE),
++ IndirectReg);
++ for (int i = IndirectBegin; i <= IndirectEnd; ++i) {
++ unsigned Addr = TII->calculateIndirectAddress(i, Channel);
++ if (LiveAddressRegisterMap.find(Addr) == LiveAddressRegisterMap.end()) {
++ continue;
++ }
++ unsigned Reg = LiveAddressRegisterMap[Addr];
++
++ // We only need to use REG_SEQUENCE for explicit defs, since the
++ // register coalescer won't do anything with the implicit defs.
++ MachineInstr *DefInstr = MRI.getVRegDef(Reg);
++ if (!regHasExplicitDef(MRI, Reg)) {
++ continue;
++ }
++
++ // Insert a REQ_SEQUENCE instruction to force the register allocator
++ // to allocate the virtual register to the correct physical register.
++ Sequence.addReg(LiveAddressRegisterMap[Addr]);
++ Sequence.addImm(TII->getRegisterInfo().getIndirectSubReg(Addr));
++ }
++ MachineInstrBuilder Mov = TII->buildIndirectRead(BB, I,
++ MI.getOperand(0).getReg(), // Value
++ Address,
++ MI.getOperand(1).getReg()); // Offset
++
++
++
++ Mov.addReg(IndirectReg, RegState::Implicit | RegState::Kill);
++ Mov.addReg(LiveAddressRegisterMap[Address], RegState::Implicit);
++
++ }
++ MI.eraseFromParent();
++ }
++ }
++ return false;
++}
++
++bool AMDGPUIndirectAddressingPass::regHasExplicitDef(MachineRegisterInfo &MRI,
++ unsigned Reg) const {
++ MachineInstr *DefInstr = MRI.getVRegDef(Reg);
++
++ if (!DefInstr) {
++ return false;
++ }
++
++ if (DefInstr->getOpcode() == AMDGPU::PHI) {
++ bool Explicit = false;
++ for (MachineInstr::const_mop_iterator I = DefInstr->operands_begin(),
++ E = DefInstr->operands_end();
++ I != E; ++I) {
++ const MachineOperand &MO = *I;
++ if (!MO.isReg() || MO.isDef()) {
++ continue;
++ }
++
++ Explicit = Explicit || regHasExplicitDef(MRI, MO.getReg());
++ }
++ return Explicit;
++ }
++
++ return DefInstr->getOperand(0).isReg() &&
++ DefInstr->getOperand(0).getReg() == Reg;
++}
+diff --git a/lib/Target/R600/AMDGPUInstrInfo.cpp b/lib/Target/R600/AMDGPUInstrInfo.cpp
+new file mode 100644
+index 0000000..640707d
+--- /dev/null
++++ b/lib/Target/R600/AMDGPUInstrInfo.cpp
+@@ -0,0 +1,266 @@
++//===-- AMDGPUInstrInfo.cpp - Base class for AMD GPU InstrInfo ------------===//
++//
++// The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++//
++/// \file
++/// \brief Implementation of the TargetInstrInfo class that is common to all
++/// AMD GPUs.
++//
++//===----------------------------------------------------------------------===//
++
++#include "AMDGPUInstrInfo.h"
++#include "AMDGPURegisterInfo.h"
++#include "AMDGPUTargetMachine.h"
++#include "AMDIL.h"
++#include "llvm/CodeGen/MachineFrameInfo.h"
++#include "llvm/CodeGen/MachineInstrBuilder.h"
++#include "llvm/CodeGen/MachineRegisterInfo.h"
++
++#define GET_INSTRINFO_CTOR
++#include "AMDGPUGenInstrInfo.inc"
++
++using namespace llvm;
++
++AMDGPUInstrInfo::AMDGPUInstrInfo(TargetMachine &tm)
++ : AMDGPUGenInstrInfo(0,0), RI(tm, *this), TM(tm) { }
++
++const AMDGPURegisterInfo &AMDGPUInstrInfo::getRegisterInfo() const {
++ return RI;
++}
++
++bool AMDGPUInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,
++ unsigned &SrcReg, unsigned &DstReg,
++ unsigned &SubIdx) const {
++// TODO: Implement this function
++ return false;
++}
++
++unsigned AMDGPUInstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
++ int &FrameIndex) const {
++// TODO: Implement this function
++ return 0;
++}
++
++unsigned AMDGPUInstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI,
++ int &FrameIndex) const {
++// TODO: Implement this function
++ return 0;
++}
++
++bool AMDGPUInstrInfo::hasLoadFromStackSlot(const MachineInstr *MI,
++ const MachineMemOperand *&MMO,
++ int &FrameIndex) const {
++// TODO: Implement this function
++ return false;
++}
++unsigned AMDGPUInstrInfo::isStoreFromStackSlot(const MachineInstr *MI,
++ int &FrameIndex) const {
++// TODO: Implement this function
++ return 0;
+}
+unsigned AMDGPUInstrInfo::isStoreFromStackSlotPostFE(const MachineInstr *MI,
+ int &FrameIndex) const {
@@ -1758,7 +1766,16 @@ index 0000000..e42a46d
+ // TODO: Implement this function
+ return true;
+}
-+
++
++bool AMDGPUInstrInfo::isRegisterStore(const MachineInstr &MI) const {
++ return get(MI.getOpcode()).TSFlags & AMDGPU_FLAG_REGISTER_STORE;
++}
++
++bool AMDGPUInstrInfo::isRegisterLoad(const MachineInstr &MI) const {
++ return get(MI.getOpcode()).TSFlags & AMDGPU_FLAG_REGISTER_LOAD;
++}
++
++
+void AMDGPUInstrInfo::convertToISA(MachineInstr & MI, MachineFunction &MF,
+ DebugLoc DL) const {
+ MachineRegisterInfo &MRI = MF.getRegInfo();
@@ -1781,10 +1798,10 @@ index 0000000..e42a46d
+}
diff --git a/lib/Target/R600/AMDGPUInstrInfo.h b/lib/Target/R600/AMDGPUInstrInfo.h
new file mode 100644
-index 0000000..32ac691
+index 0000000..5220aa0
--- /dev/null
+++ b/lib/Target/R600/AMDGPUInstrInfo.h
-@@ -0,0 +1,149 @@
+@@ -0,0 +1,207 @@
+//===-- AMDGPUInstrInfo.h - AMDGPU Instruction Information ------*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -1828,9 +1845,10 @@ index 0000000..32ac691
+class AMDGPUInstrInfo : public AMDGPUGenInstrInfo {
+private:
+ const AMDGPURegisterInfo RI;
-+ TargetMachine &TM;
+ bool getNextBranchInstr(MachineBasicBlock::iterator &iter,
+ MachineBasicBlock &MBB) const;
++protected:
++ TargetMachine &TM;
+public:
+ explicit AMDGPUInstrInfo(TargetMachine &tm);
+
@@ -1918,12 +1936,66 @@ index 0000000..32ac691
+ bool isAExtLoadInst(llvm::MachineInstr *MI) const;
+ bool isStoreInst(llvm::MachineInstr *MI) const;
+ bool isTruncStoreInst(llvm::MachineInstr *MI) const;
++ bool isRegisterStore(const MachineInstr &MI) const;
++ bool isRegisterLoad(const MachineInstr &MI) const;
++
++//===---------------------------------------------------------------------===//
++// Pure virtual funtions to be implemented by sub-classes.
++//===---------------------------------------------------------------------===//
+
+ virtual MachineInstr* getMovImmInstr(MachineFunction *MF, unsigned DstReg,
+ int64_t Imm) const = 0;
+ virtual unsigned getIEQOpcode() const = 0;
+ virtual bool isMov(unsigned opcode) const = 0;
+
++ /// \returns the smallest register index that will be accessed by an indirect
++ /// read or write or -1 if indirect addressing is not used by this program.
++ virtual int getIndirectIndexBegin(const MachineFunction &MF) const = 0;
++
++ /// \returns the largest register index that will be accessed by an indirect
++ /// read or write or -1 if indirect addressing is not used by this program.
++ virtual int getIndirectIndexEnd(const MachineFunction &MF) const = 0;
++
++ /// \brief Calculate the "Indirect Address" for the given \p RegIndex and
++ /// \p Channel
++ ///
++ /// We model indirect addressing using a virtual address space that can be
++ /// accesed with loads and stores. The "Indirect Address" is the memory
++ /// address in this virtual address space that maps to the given \p RegIndex
++ /// and \p Channel.
++ virtual unsigned calculateIndirectAddress(unsigned RegIndex,
++ unsigned Channel) const = 0;
++
++ /// \returns The register class to be used for storing values to an
++ /// "Indirect Address" .
++ virtual const TargetRegisterClass *getIndirectAddrStoreRegClass(
++ unsigned SourceReg) const = 0;
++
++ /// \returns The register class to be used for loading values from
++ /// an "Indirect Address" .
++ virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const = 0;
++
++ /// \brief Build instruction(s) for an indirect register write.
++ ///
++ /// \returns The instruction that performs the indirect register write
++ virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB,
++ MachineBasicBlock::iterator I,
++ unsigned ValueReg, unsigned Address,
++ unsigned OffsetReg) const = 0;
++
++ /// \brief Build instruction(s) for an indirect register read.
++ ///
++ /// \returns The instruction that performs the indirect register read
++ virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB,
++ MachineBasicBlock::iterator I,
++ unsigned ValueReg, unsigned Address,
++ unsigned OffsetReg) const = 0;
++
++ /// \returns the register class whose sub registers are the set of all
++ /// possible registers that can be used for indirect addressing.
++ virtual const TargetRegisterClass *getSuperIndirectRegClass() const = 0;
++
++
+ /// \brief Convert the AMDIL MachineInstr to a supported ISA
+ /// MachineInstr
+ virtual void convertToISA(MachineInstr & MI, MachineFunction &MF,
@@ -1933,13 +2005,16 @@ index 0000000..32ac691
+
+} // End llvm namespace
+
++#define AMDGPU_FLAG_REGISTER_LOAD (UINT64_C(1) << 63)
++#define AMDGPU_FLAG_REGISTER_STORE (UINT64_C(1) << 62)
++
+#endif // AMDGPUINSTRINFO_H
diff --git a/lib/Target/R600/AMDGPUInstrInfo.td b/lib/Target/R600/AMDGPUInstrInfo.td
new file mode 100644
-index 0000000..96368e8
+index 0000000..b66ae87
--- /dev/null
+++ b/lib/Target/R600/AMDGPUInstrInfo.td
-@@ -0,0 +1,74 @@
+@@ -0,0 +1,82 @@
+//===-- AMDGPUInstrInfo.td - AMDGPU DAG nodes --------------*- tablegen -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -2014,12 +2089,20 @@ index 0000000..96368e8
+def AMDGPUurecip : SDNode<"AMDGPUISD::URECIP", SDTIntUnaryOp>;
+
+def fpow : SDNode<"ISD::FPOW", SDTFPBinOp>;
++
++def AMDGPUregister_load : SDNode<"AMDGPUISD::REGISTER_LOAD",
++ SDTypeProfile<1, 2, [SDTCisPtrTy<1>, SDTCisInt<2>]>,
++ [SDNPHasChain, SDNPMayLoad]>;
++
++def AMDGPUregister_store : SDNode<"AMDGPUISD::REGISTER_STORE",
++ SDTypeProfile<0, 3, [SDTCisPtrTy<1>, SDTCisInt<2>]>,
++ [SDNPHasChain, SDNPMayStore]>;
diff --git a/lib/Target/R600/AMDGPUInstructions.td b/lib/Target/R600/AMDGPUInstructions.td
new file mode 100644
-index 0000000..e634d20
+index 0000000..0559a5a
--- /dev/null
+++ b/lib/Target/R600/AMDGPUInstructions.td
-@@ -0,0 +1,190 @@
+@@ -0,0 +1,268 @@
+//===-- AMDGPUInstructions.td - Common instruction defs ---*- tablegen -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -2035,8 +2118,8 @@ index 0000000..e634d20
+//===----------------------------------------------------------------------===//
+
+class AMDGPUInst <dag outs, dag ins, string asm, list<dag> pattern> : Instruction {
-+ field bits<16> AMDILOp = 0;
-+ field bits<3> Gen = 0;
++ field bit isRegisterLoad = 0;
++ field bit isRegisterStore = 0;
+
+ let Namespace = "AMDGPU";
+ let OutOperandList = outs;
@@ -2044,8 +2127,9 @@ index 0000000..e634d20
+ let AsmString = asm;
+ let Pattern = pattern;
+ let Itinerary = NullALU;
-+ let TSFlags{42-40} = Gen;
-+ let TSFlags{63-48} = AMDILOp;
++
++ let TSFlags{63} = isRegisterLoad;
++ let TSFlags{62} = isRegisterStore;
+}
+
+class AMDGPUShaderInst <dag outs, dag ins, string asm, list<dag> pattern>
@@ -2123,7 +2207,9 @@ index 0000000..e634d20
+ [{return N->isExactlyValue(1.0);}]
+>;
+
-+let isCodeGenOnly = 1, isPseudo = 1, usesCustomInserter = 1 in {
++let isCodeGenOnly = 1, isPseudo = 1 in {
++
++let usesCustomInserter = 1 in {
+
+class CLAMP <RegisterClass rc> : AMDGPUShaderInst <
+ (outs rc:$dst),
@@ -2153,7 +2239,31 @@ index 0000000..e634d20
+ [(int_AMDGPU_shader_type imm:$type)]
+>;
+
-+} // End isCodeGenOnly = 1, isPseudo = 1, hasCustomInserter = 1
++} // usesCustomInserter = 1
++
++multiclass RegisterLoadStore <RegisterClass dstClass, Operand addrClass,
++ ComplexPattern addrPat> {
++ def RegisterLoad : AMDGPUShaderInst <
++ (outs dstClass:$dst),
++ (ins addrClass:$addr, i32imm:$chan),
++ "RegisterLoad $dst, $addr",
++ [(set (i32 dstClass:$dst), (AMDGPUregister_load addrPat:$addr,
++ (i32 timm:$chan)))]
++ > {
++ let isRegisterLoad = 1;
++ }
++
++ def RegisterStore : AMDGPUShaderInst <
++ (outs),
++ (ins dstClass:$val, addrClass:$addr, i32imm:$chan),
++ "RegisterStore $val, $addr",
++ [(AMDGPUregister_store (i32 dstClass:$val), addrPat:$addr, (i32 timm:$chan))]
++ > {
++ let isRegisterStore = 1;
++ }
++}
++
++} // End isCodeGenOnly = 1, isPseudo = 1
+
+/* Generic helper patterns for intrinsics */
+/* -------------------------------------- */
@@ -2186,13 +2296,64 @@ index 0000000..e634d20
+>;
+
+// Vector Build pattern
++class Vector1_Build <ValueType vecType, RegisterClass vectorClass,
++ ValueType elemType, RegisterClass elemClass> : Pat <
++ (vecType (build_vector (elemType elemClass:$src))),
++ (vecType elemClass:$src)
++>;
++
++class Vector2_Build <ValueType vecType, RegisterClass vectorClass,
++ ValueType elemType, RegisterClass elemClass> : Pat <
++ (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1))),
++ (INSERT_SUBREG (INSERT_SUBREG
++ (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1)
++>;
++
+class Vector_Build <ValueType vecType, RegisterClass vectorClass,
+ ValueType elemType, RegisterClass elemClass> : Pat <
+ (vecType (build_vector (elemType elemClass:$x), (elemType elemClass:$y),
+ (elemType elemClass:$z), (elemType elemClass:$w))),
+ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
-+ (vecType (IMPLICIT_DEF)), elemClass:$x, sel_x), elemClass:$y, sel_y),
-+ elemClass:$z, sel_z), elemClass:$w, sel_w)
++ (vecType (IMPLICIT_DEF)), elemClass:$x, sub0), elemClass:$y, sub1),
++ elemClass:$z, sub2), elemClass:$w, sub3)
++>;
++
++class Vector8_Build <ValueType vecType, RegisterClass vectorClass,
++ ValueType elemType, RegisterClass elemClass> : Pat <
++ (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1),
++ (elemType elemClass:$sub2), (elemType elemClass:$sub3),
++ (elemType elemClass:$sub4), (elemType elemClass:$sub5),
++ (elemType elemClass:$sub6), (elemType elemClass:$sub7))),
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++ (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1),
++ elemClass:$sub2, sub2), elemClass:$sub3, sub3),
++ elemClass:$sub4, sub4), elemClass:$sub5, sub5),
++ elemClass:$sub6, sub6), elemClass:$sub7, sub7)
++>;
++
++class Vector16_Build <ValueType vecType, RegisterClass vectorClass,
++ ValueType elemType, RegisterClass elemClass> : Pat <
++ (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1),
++ (elemType elemClass:$sub2), (elemType elemClass:$sub3),
++ (elemType elemClass:$sub4), (elemType elemClass:$sub5),
++ (elemType elemClass:$sub6), (elemType elemClass:$sub7),
++ (elemType elemClass:$sub8), (elemType elemClass:$sub9),
++ (elemType elemClass:$sub10), (elemType elemClass:$sub11),
++ (elemType elemClass:$sub12), (elemType elemClass:$sub13),
++ (elemType elemClass:$sub14), (elemType elemClass:$sub15))),
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++ (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1),
++ elemClass:$sub2, sub2), elemClass:$sub3, sub3),
++ elemClass:$sub4, sub4), elemClass:$sub5, sub5),
++ elemClass:$sub6, sub6), elemClass:$sub7, sub7),
++ elemClass:$sub8, sub8), elemClass:$sub9, sub9),
++ elemClass:$sub10, sub10), elemClass:$sub11, sub11),
++ elemClass:$sub12, sub12), elemClass:$sub13, sub13),
++ elemClass:$sub14, sub14), elemClass:$sub15, sub15)
+>;
+
+// bitconvert pattern
@@ -2409,10 +2570,10 @@ index 0000000..d7d538e
+#endif //AMDGPU_MCINSTLOWER_H
diff --git a/lib/Target/R600/AMDGPURegisterInfo.cpp b/lib/Target/R600/AMDGPURegisterInfo.cpp
new file mode 100644
-index 0000000..eeafec8
+index 0000000..d62e57b
--- /dev/null
+++ b/lib/Target/R600/AMDGPURegisterInfo.cpp
-@@ -0,0 +1,51 @@
+@@ -0,0 +1,74 @@
+//===-- AMDGPURegisterInfo.cpp - AMDGPU Register Information -------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -2462,14 +2623,37 @@ index 0000000..eeafec8
+ return 0;
+}
+
++unsigned AMDGPURegisterInfo::getIndirectSubReg(unsigned IndirectIndex) const {
++
++ switch(IndirectIndex) {
++ case 0: return AMDGPU::sub0;
++ case 1: return AMDGPU::sub1;
++ case 2: return AMDGPU::sub2;
++ case 3: return AMDGPU::sub3;
++ case 4: return AMDGPU::sub4;
++ case 5: return AMDGPU::sub5;
++ case 6: return AMDGPU::sub6;
++ case 7: return AMDGPU::sub7;
++ case 8: return AMDGPU::sub8;
++ case 9: return AMDGPU::sub9;
++ case 10: return AMDGPU::sub10;
++ case 11: return AMDGPU::sub11;
++ case 12: return AMDGPU::sub12;
++ case 13: return AMDGPU::sub13;
++ case 14: return AMDGPU::sub14;
++ case 15: return AMDGPU::sub15;
++ default: llvm_unreachable("indirect index out of range");
++ }
++}
++
+#define GET_REGINFO_TARGET_DESC
+#include "AMDGPUGenRegisterInfo.inc"
diff --git a/lib/Target/R600/AMDGPURegisterInfo.h b/lib/Target/R600/AMDGPURegisterInfo.h
new file mode 100644
-index 0000000..76ee7ae
+index 0000000..5007ff5
--- /dev/null
+++ b/lib/Target/R600/AMDGPURegisterInfo.h
-@@ -0,0 +1,63 @@
+@@ -0,0 +1,65 @@
+//===-- AMDGPURegisterInfo.h - AMDGPURegisterInfo Interface -*- C++ -*-----===//
+//
+// The LLVM Compiler Infrastructure
@@ -2528,6 +2712,8 @@ index 0000000..76ee7ae
+ RegScavenger *RS) const;
+ unsigned getFrameRegister(const MachineFunction &MF) const;
+
++ unsigned getIndirectSubReg(unsigned IndirectIndex) const;
++
+};
+
+} // End namespace llvm
@@ -2535,10 +2721,10 @@ index 0000000..76ee7ae
+#endif // AMDIDSAREGISTERINFO_H
diff --git a/lib/Target/R600/AMDGPURegisterInfo.td b/lib/Target/R600/AMDGPURegisterInfo.td
new file mode 100644
-index 0000000..8181e02
+index 0000000..b5aca03
--- /dev/null
+++ b/lib/Target/R600/AMDGPURegisterInfo.td
-@@ -0,0 +1,22 @@
+@@ -0,0 +1,25 @@
+//===-- AMDGPURegisterInfo.td - AMDGPU register info -------*- tablegen -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -2553,20 +2739,23 @@ index 0000000..8181e02
+//===----------------------------------------------------------------------===//
+
+let Namespace = "AMDGPU" in {
-+ def sel_x : SubRegIndex;
-+ def sel_y : SubRegIndex;
-+ def sel_z : SubRegIndex;
-+ def sel_w : SubRegIndex;
++
++foreach Index = 0-15 in {
++ def sub#Index : SubRegIndex;
++}
++
++def INDIRECT_BASE_ADDR : Register <"INDIRECT_BASE_ADDR">;
++
+}
+
+include "R600RegisterInfo.td"
+include "SIRegisterInfo.td"
diff --git a/lib/Target/R600/AMDGPUStructurizeCFG.cpp b/lib/Target/R600/AMDGPUStructurizeCFG.cpp
new file mode 100644
-index 0000000..22338b5
+index 0000000..a8c9621
--- /dev/null
+++ b/lib/Target/R600/AMDGPUStructurizeCFG.cpp
-@@ -0,0 +1,714 @@
+@@ -0,0 +1,893 @@
+//===-- AMDGPUStructurizeCFG.cpp - ------------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -2591,30 +2780,101 @@ index 0000000..22338b5
+#include "llvm/Analysis/RegionInfo.h"
+#include "llvm/Analysis/RegionPass.h"
+#include "llvm/Transforms/Utils/SSAUpdater.h"
++#include "llvm/Support/PatternMatch.h"
+
+using namespace llvm;
++using namespace llvm::PatternMatch;
+
+namespace {
+
+// Definition of the complex types used in this pass.
+
+typedef std::pair<BasicBlock *, Value *> BBValuePair;
-+typedef ArrayRef<BasicBlock*> BBVecRef;
+
+typedef SmallVector<RegionNode*, 8> RNVector;
+typedef SmallVector<BasicBlock*, 8> BBVector;
++typedef SmallVector<BranchInst*, 8> BranchVector;
+typedef SmallVector<BBValuePair, 2> BBValueVector;
+
++typedef SmallPtrSet<BasicBlock *, 8> BBSet;
++
+typedef DenseMap<PHINode *, BBValueVector> PhiMap;
++typedef DenseMap<DomTreeNode *, unsigned> DTN2UnsignedMap;
+typedef DenseMap<BasicBlock *, PhiMap> BBPhiMap;
+typedef DenseMap<BasicBlock *, Value *> BBPredicates;
+typedef DenseMap<BasicBlock *, BBPredicates> PredMap;
-+typedef DenseMap<BasicBlock *, unsigned> VisitedMap;
++typedef DenseMap<BasicBlock *, BasicBlock*> BB2BBMap;
++typedef DenseMap<BasicBlock *, BBVector> BB2BBVecMap;
+
+// The name for newly created blocks.
+
+static const char *FlowBlockName = "Flow";
+
++/// @brief Find the nearest common dominator for multiple BasicBlocks
++///
++/// Helper class for AMDGPUStructurizeCFG
++/// TODO: Maybe move into common code
++class NearestCommonDominator {
++
++ DominatorTree *DT;
++
++ DTN2UnsignedMap IndexMap;
++
++ BasicBlock *Result;
++ unsigned ResultIndex;
++ bool ExplicitMentioned;
++
++public:
++ /// \brief Start a new query
++ NearestCommonDominator(DominatorTree *DomTree) {
++ DT = DomTree;
++ Result = 0;
++ }
++
++ /// \brief Add BB to the resulting dominator
++ void addBlock(BasicBlock *BB, bool Remember = true) {
++
++ DomTreeNode *Node = DT->getNode(BB);
++
++ if (Result == 0) {
++ unsigned Numbering = 0;
++ for (;Node;Node = Node->getIDom())
++ IndexMap[Node] = ++Numbering;
++ Result = BB;
++ ResultIndex = 1;
++ ExplicitMentioned = Remember;
++ return;
++ }
++
++ for (;Node;Node = Node->getIDom())
++ if (IndexMap.count(Node))
++ break;
++ else
++ IndexMap[Node] = 0;
++
++ assert(Node && "Dominator tree invalid!");
++
++ unsigned Numbering = IndexMap[Node];
++ if (Numbering > ResultIndex) {
++ Result = Node->getBlock();
++ ResultIndex = Numbering;
++ ExplicitMentioned = Remember && (Result == BB);
++ } else if (Numbering == ResultIndex) {
++ ExplicitMentioned |= Remember;
++ }
++ }
++
++ /// \brief Is "Result" one of the BBs added with "Remember" = True?
++ bool wasResultExplicitMentioned() {
++ return ExplicitMentioned;
++ }
++
++ /// \brief Get the query result
++ BasicBlock *getResult() {
++ return Result;
++ }
++};
++
+/// @brief Transforms the control flow graph on one single entry/exit region
+/// at a time.
+///
@@ -2675,45 +2935,62 @@ index 0000000..22338b5
+ DominatorTree *DT;
+
+ RNVector Order;
-+ VisitedMap Visited;
-+ PredMap Predicates;
++ BBSet Visited;
++
+ BBPhiMap DeletedPhis;
-+ BBVector FlowsInserted;
++ BB2BBVecMap AddedPhis;
++
++ PredMap Predicates;
++ BranchVector Conditions;
++
++ BB2BBMap Loops;
++ PredMap LoopPreds;
++ BranchVector LoopConds;
+
-+ BasicBlock *LoopStart;
-+ BasicBlock *LoopEnd;
-+ BBPredicates LoopPred;
++ RegionNode *PrevNode;
+
+ void orderNodes();
+
-+ void buildPredicate(BranchInst *Term, unsigned Idx,
-+ BBPredicates &Pred, bool Invert);
++ void analyzeLoops(RegionNode *N);
+
-+ void analyzeBlock(BasicBlock *BB);
++ Value *invert(Value *Condition);
+
-+ void analyzeLoop(BasicBlock *BB, unsigned &LoopIdx);
++ Value *buildCondition(BranchInst *Term, unsigned Idx, bool Invert);
++
++ void gatherPredicates(RegionNode *N);
+
+ void collectInfos();
+
-+ bool dominatesPredicates(BasicBlock *A, BasicBlock *B);
++ void insertConditions(bool Loops);
++
++ void delPhiValues(BasicBlock *From, BasicBlock *To);
++
++ void addPhiValues(BasicBlock *From, BasicBlock *To);
++
++ void setPhiValues();
+
+ void killTerminator(BasicBlock *BB);
+
-+ RegionNode *skipChained(RegionNode *Node);
++ void changeExit(RegionNode *Node, BasicBlock *NewExit,
++ bool IncludeDominator);
+
-+ void delPhiValues(BasicBlock *From, BasicBlock *To);
++ BasicBlock *getNextFlow(BasicBlock *Dominator);
+
-+ void addPhiValues(BasicBlock *From, BasicBlock *To);
++ BasicBlock *needPrefix(bool NeedEmpty);
+
-+ BasicBlock *getNextFlow(BasicBlock *Prev);
++ BasicBlock *needPostfix(BasicBlock *Flow, bool ExitUseAllowed);
+
-+ bool isPredictableTrue(BasicBlock *Prev, BasicBlock *Node);
++ void setPrevNode(BasicBlock *BB);
+
-+ BasicBlock *wireFlowBlock(BasicBlock *Prev, RegionNode *Node);
++ bool dominatesPredicates(BasicBlock *BB, RegionNode *Node);
+
-+ void createFlow();
++ bool isPredictableTrue(RegionNode *Node);
++
++ void wireFlow(bool ExitUseAllowed, BasicBlock *LoopEnd);
+
-+ void insertConditions();
++ void handleLoops(bool ExitUseAllowed, BasicBlock *LoopEnd);
++
++ void createFlow();
+
+ void rebuildSSA();
+
@@ -2767,212 +3044,214 @@ index 0000000..22338b5
+ }
+}
+
-+/// \brief Build blocks and loop predicates
-+void AMDGPUStructurizeCFG::buildPredicate(BranchInst *Term, unsigned Idx,
-+ BBPredicates &Pred, bool Invert) {
-+ Value *True = Invert ? BoolFalse : BoolTrue;
-+ Value *False = Invert ? BoolTrue : BoolFalse;
++/// \brief Determine the end of the loops
++void AMDGPUStructurizeCFG::analyzeLoops(RegionNode *N) {
+
-+ RegionInfo *RI = ParentRegion->getRegionInfo();
-+ BasicBlock *BB = Term->getParent();
++ if (N->isSubRegion()) {
++ // Test for exit as back edge
++ BasicBlock *Exit = N->getNodeAs<Region>()->getExit();
++ if (Visited.count(Exit))
++ Loops[Exit] = N->getEntry();
++
++ } else {
++ // Test for sucessors as back edge
++ BasicBlock *BB = N->getNodeAs<BasicBlock>();
++ BranchInst *Term = cast<BranchInst>(BB->getTerminator());
+
-+ // Handle the case where multiple regions start at the same block
-+ Region *R = BB != ParentRegion->getEntry() ?
-+ RI->getRegionFor(BB) : ParentRegion;
++ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
++ BasicBlock *Succ = Term->getSuccessor(i);
+
-+ if (R == ParentRegion) {
-+ // It's a top level block in our region
-+ Value *Cond = True;
-+ if (Term->isConditional()) {
-+ BasicBlock *Other = Term->getSuccessor(!Idx);
++ if (Visited.count(Succ))
++ Loops[Succ] = BB;
++ }
++ }
++}
+
-+ if (Visited.count(Other)) {
-+ if (!Pred.count(Other))
-+ Pred[Other] = False;
++/// \brief Invert the given condition
++Value *AMDGPUStructurizeCFG::invert(Value *Condition) {
+
-+ if (!Pred.count(BB))
-+ Pred[BB] = True;
-+ return;
-+ }
-+ Cond = Term->getCondition();
++ // First: Check if it's a constant
++ if (Condition == BoolTrue)
++ return BoolFalse;
+
-+ if (Idx != Invert)
-+ Cond = BinaryOperator::CreateNot(Cond, "", Term);
-+ }
++ if (Condition == BoolFalse)
++ return BoolTrue;
+
-+ Pred[BB] = Cond;
++ if (Condition == BoolUndef)
++ return BoolUndef;
+
-+ } else if (ParentRegion->contains(R)) {
-+ // It's a block in a sub region
-+ while(R->getParent() != ParentRegion)
-+ R = R->getParent();
++ // Second: If the condition is already inverted, return the original value
++ if (match(Condition, m_Not(m_Value(Condition))))
++ return Condition;
+
-+ Pred[R->getEntry()] = True;
++ // Third: Check all the users for an invert
++ BasicBlock *Parent = cast<Instruction>(Condition)->getParent();
++ for (Value::use_iterator I = Condition->use_begin(),
++ E = Condition->use_end(); I != E; ++I) {
+
-+ } else {
-+ // It's a branch from outside into our parent region
-+ Pred[BB] = True;
++ Instruction *User = dyn_cast<Instruction>(*I);
++ if (!User || User->getParent() != Parent)
++ continue;
++
++ if (match(*I, m_Not(m_Specific(Condition))))
++ return *I;
+ }
-+}
+
-+/// \brief Analyze the successors of each block and build up predicates
-+void AMDGPUStructurizeCFG::analyzeBlock(BasicBlock *BB) {
-+ pred_iterator PI = pred_begin(BB), PE = pred_end(BB);
-+ BBPredicates &Pred = Predicates[BB];
++ // Last option: Create a new instruction
++ return BinaryOperator::CreateNot(Condition, "", Parent->getTerminator());
++}
+
-+ for (; PI != PE; ++PI) {
-+ BranchInst *Term = cast<BranchInst>((*PI)->getTerminator());
++/// \brief Build the condition for one edge
++Value *AMDGPUStructurizeCFG::buildCondition(BranchInst *Term, unsigned Idx,
++ bool Invert) {
++ Value *Cond = Invert ? BoolFalse : BoolTrue;
++ if (Term->isConditional()) {
++ Cond = Term->getCondition();
+
-+ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
-+ BasicBlock *Succ = Term->getSuccessor(i);
-+ if (Succ != BB)
-+ continue;
-+ buildPredicate(Term, i, Pred, false);
-+ }
++ if (Idx != Invert)
++ Cond = invert(Cond);
+ }
++ return Cond;
+}
+
-+/// \brief Analyze the conditions leading to loop to a previous block
-+void AMDGPUStructurizeCFG::analyzeLoop(BasicBlock *BB, unsigned &LoopIdx) {
-+ BranchInst *Term = cast<BranchInst>(BB->getTerminator());
++/// \brief Analyze the predecessors of each block and build up predicates
++void AMDGPUStructurizeCFG::gatherPredicates(RegionNode *N) {
+
-+ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
-+ BasicBlock *Succ = Term->getSuccessor(i);
++ RegionInfo *RI = ParentRegion->getRegionInfo();
++ BasicBlock *BB = N->getEntry();
++ BBPredicates &Pred = Predicates[BB];
++ BBPredicates &LPred = LoopPreds[BB];
++
++ for (pred_iterator PI = pred_begin(BB), PE = pred_end(BB);
++ PI != PE; ++PI) {
+
-+ // Ignore it if it's not a back edge
-+ if (!Visited.count(Succ))
++ // Ignore it if it's a branch from outside into our region entry
++ if (!ParentRegion->contains(*PI))
+ continue;
+
-+ buildPredicate(Term, i, LoopPred, true);
++ Region *R = RI->getRegionFor(*PI);
++ if (R == ParentRegion) {
++
++ // It's a top level block in our region
++ BranchInst *Term = cast<BranchInst>((*PI)->getTerminator());
++ for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
++ BasicBlock *Succ = Term->getSuccessor(i);
++ if (Succ != BB)
++ continue;
++
++ if (Visited.count(*PI)) {
++ // Normal forward edge
++ if (Term->isConditional()) {
++ // Try to treat it like an ELSE block
++ BasicBlock *Other = Term->getSuccessor(!i);
++ if (Visited.count(Other) && !Loops.count(Other) &&
++ !Pred.count(Other) && !Pred.count(*PI)) {
++
++ Pred[Other] = BoolFalse;
++ Pred[*PI] = BoolTrue;
++ continue;
++ }
++ }
++ Pred[*PI] = buildCondition(Term, i, false);
++
++ } else {
++ // Back edge
++ LPred[*PI] = buildCondition(Term, i, true);
++ }
++ }
++
++ } else {
++
++ // It's an exit from a sub region
++ while(R->getParent() != ParentRegion)
++ R = R->getParent();
++
++ // Edge from inside a subregion to its entry, ignore it
++ if (R == N)
++ continue;
+
-+ LoopEnd = BB;
-+ if (Visited[Succ] < LoopIdx) {
-+ LoopIdx = Visited[Succ];
-+ LoopStart = Succ;
++ BasicBlock *Entry = R->getEntry();
++ if (Visited.count(Entry))
++ Pred[Entry] = BoolTrue;
++ else
++ LPred[Entry] = BoolFalse;
+ }
+ }
+}
+
+/// \brief Collect various loop and predicate infos
+void AMDGPUStructurizeCFG::collectInfos() {
-+ unsigned Number = 0, LoopIdx = ~0;
+
+ // Reset predicate
+ Predicates.clear();
+
+ // and loop infos
-+ LoopStart = LoopEnd = 0;
-+ LoopPred.clear();
++ Loops.clear();
++ LoopPreds.clear();
++
++ // Reset the visited nodes
++ Visited.clear();
+
-+ RNVector::reverse_iterator OI = Order.rbegin(), OE = Order.rend();
-+ for (Visited.clear(); OI != OE; Visited[(*OI++)->getEntry()] = ++Number) {
++ for (RNVector::reverse_iterator OI = Order.rbegin(), OE = Order.rend();
++ OI != OE; ++OI) {
+
+ // Analyze all the conditions leading to a node
-+ analyzeBlock((*OI)->getEntry());
++ gatherPredicates(*OI);
+
-+ if ((*OI)->isSubRegion())
-+ continue;
++ // Remember that we've seen this node
++ Visited.insert((*OI)->getEntry());
+
-+ // Find the first/last loop nodes and loop predicates
-+ analyzeLoop((*OI)->getNodeAs<BasicBlock>(), LoopIdx);
++ // Find the last back edges
++ analyzeLoops(*OI);
+ }
+}
+
-+/// \brief Does A dominate all the predicates of B ?
-+bool AMDGPUStructurizeCFG::dominatesPredicates(BasicBlock *A, BasicBlock *B) {
-+ BBPredicates &Preds = Predicates[B];
-+ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
-+ PI != PE; ++PI) {
++/// \brief Insert the missing branch conditions
++void AMDGPUStructurizeCFG::insertConditions(bool Loops) {
++ BranchVector &Conds = Loops ? LoopConds : Conditions;
++ Value *Default = Loops ? BoolTrue : BoolFalse;
++ SSAUpdater PhiInserter;
+
-+ if (!DT->dominates(A, PI->first))
-+ return false;
-+ }
-+ return true;
-+}
++ for (BranchVector::iterator I = Conds.begin(),
++ E = Conds.end(); I != E; ++I) {
+
-+/// \brief Remove phi values from all successors and the remove the terminator.
-+void AMDGPUStructurizeCFG::killTerminator(BasicBlock *BB) {
-+ TerminatorInst *Term = BB->getTerminator();
-+ if (!Term)
-+ return;
++ BranchInst *Term = *I;
++ assert(Term->isConditional());
+
-+ for (succ_iterator SI = succ_begin(BB), SE = succ_end(BB);
-+ SI != SE; ++SI) {
++ BasicBlock *Parent = Term->getParent();
++ BasicBlock *SuccTrue = Term->getSuccessor(0);
++ BasicBlock *SuccFalse = Term->getSuccessor(1);
+
-+ delPhiValues(BB, *SI);
-+ }
++ PhiInserter.Initialize(Boolean, "");
++ PhiInserter.AddAvailableValue(&Func->getEntryBlock(), Default);
++ PhiInserter.AddAvailableValue(Loops ? SuccFalse : Parent, Default);
+
-+ Term->eraseFromParent();
-+}
++ BBPredicates &Preds = Loops ? LoopPreds[SuccFalse] : Predicates[SuccTrue];
+
-+/// First: Skip forward to the first region node that either isn't a subregion or not
-+/// dominating it's exit, remove all the skipped nodes from the node order.
-+///
-+/// Second: Handle the first successor directly if the resulting nodes successor
-+/// predicates are still dominated by the original entry
-+RegionNode *AMDGPUStructurizeCFG::skipChained(RegionNode *Node) {
-+ BasicBlock *Entry = Node->getEntry();
++ NearestCommonDominator Dominator(DT);
++ Dominator.addBlock(Parent, false);
+
-+ // Skip forward as long as it is just a linear flow
-+ while (true) {
-+ BasicBlock *Entry = Node->getEntry();
-+ BasicBlock *Exit;
++ Value *ParentValue = 0;
++ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
++ PI != PE; ++PI) {
+
-+ if (Node->isSubRegion()) {
-+ Exit = Node->getNodeAs<Region>()->getExit();
-+ } else {
-+ TerminatorInst *Term = Entry->getTerminator();
-+ if (Term->getNumSuccessors() != 1)
++ if (PI->first == Parent) {
++ ParentValue = PI->second;
+ break;
-+ Exit = Term->getSuccessor(0);
++ }
++ PhiInserter.AddAvailableValue(PI->first, PI->second);
++ Dominator.addBlock(PI->first);
+ }
+
-+ // It's a back edge, break here so we can insert a loop node
-+ if (!Visited.count(Exit))
-+ return Node;
-+
-+ // More than node edges are pointing to exit
-+ if (!DT->dominates(Entry, Exit))
-+ return Node;
-+
-+ RegionNode *Next = ParentRegion->getNode(Exit);
-+ RNVector::iterator I = std::find(Order.begin(), Order.end(), Next);
-+ assert(I != Order.end());
-+
-+ Visited.erase(Next->getEntry());
-+ Order.erase(I);
-+ Node = Next;
-+ }
++ if (ParentValue) {
++ Term->setCondition(ParentValue);
++ } else {
++ if (!Dominator.wasResultExplicitMentioned())
++ PhiInserter.AddAvailableValue(Dominator.getResult(), Default);
+
-+ BasicBlock *BB = Node->getEntry();
-+ TerminatorInst *Term = BB->getTerminator();
-+ if (Term->getNumSuccessors() != 2)
-+ return Node;
-+
-+ // Our node has exactly two succesors, check if we can handle
-+ // any of them directly
-+ BasicBlock *Succ = Term->getSuccessor(0);
-+ if (!Visited.count(Succ) || !dominatesPredicates(Entry, Succ)) {
-+ Succ = Term->getSuccessor(1);
-+ if (!Visited.count(Succ) || !dominatesPredicates(Entry, Succ))
-+ return Node;
-+ } else {
-+ BasicBlock *Succ2 = Term->getSuccessor(1);
-+ if (Visited.count(Succ2) && Visited[Succ] > Visited[Succ2] &&
-+ dominatesPredicates(Entry, Succ2))
-+ Succ = Succ2;
++ Term->setCondition(PhiInserter.GetValueInMiddleOfBlock(Parent));
++ }
+ }
-+
-+ RegionNode *Next = ParentRegion->getNode(Succ);
-+ RNVector::iterator E = Order.end();
-+ RNVector::iterator I = std::find(Order.begin(), E, Next);
-+ assert(I != E);
-+
-+ killTerminator(BB);
-+ FlowsInserted.push_back(BB);
-+ Visited.erase(Succ);
-+ Order.erase(I);
-+ return ParentRegion->getNode(wireFlowBlock(BB, Next));
+}
+
+/// \brief Remove all PHI values coming from "From" into "To" and remember
@@ -2990,224 +3269,306 @@ index 0000000..22338b5
+ }
+}
+
-+/// \brief Add the PHI values back once we knew the new predecessor
++/// \brief Add a dummy PHI value as soon as we knew the new predecessor
+void AMDGPUStructurizeCFG::addPhiValues(BasicBlock *From, BasicBlock *To) {
-+ if (!DeletedPhis.count(To))
-+ return;
++ for (BasicBlock::iterator I = To->begin(), E = To->end();
++ I != E && isa<PHINode>(*I);) {
+
-+ PhiMap &Map = DeletedPhis[To];
++ PHINode &Phi = cast<PHINode>(*I++);
++ Value *Undef = UndefValue::get(Phi.getType());
++ Phi.addIncoming(Undef, From);
++ }
++ AddedPhis[To].push_back(From);
++}
++
++/// \brief Add the real PHI value as soon as everything is set up
++void AMDGPUStructurizeCFG::setPhiValues() {
++
+ SSAUpdater Updater;
++ for (BB2BBVecMap::iterator AI = AddedPhis.begin(), AE = AddedPhis.end();
++ AI != AE; ++AI) {
+
-+ for (PhiMap::iterator I = Map.begin(), E = Map.end(); I != E; ++I) {
++ BasicBlock *To = AI->first;
++ BBVector &From = AI->second;
+
-+ PHINode *Phi = I->first;
-+ Updater.Initialize(Phi->getType(), "");
-+ BasicBlock *Fallback = To;
-+ bool HaveFallback = false;
++ if (!DeletedPhis.count(To))
++ continue;
+
-+ for (BBValueVector::iterator VI = I->second.begin(), VE = I->second.end();
-+ VI != VE; ++VI) {
++ PhiMap &Map = DeletedPhis[To];
++ for (PhiMap::iterator PI = Map.begin(), PE = Map.end();
++ PI != PE; ++PI) {
+
-+ Updater.AddAvailableValue(VI->first, VI->second);
-+ BasicBlock *Dom = DT->findNearestCommonDominator(Fallback, VI->first);
-+ if (Dom == VI->first)
-+ HaveFallback = true;
-+ else if (Dom != Fallback)
-+ HaveFallback = false;
-+ Fallback = Dom;
-+ }
-+ if (!HaveFallback) {
++ PHINode *Phi = PI->first;
+ Value *Undef = UndefValue::get(Phi->getType());
-+ Updater.AddAvailableValue(Fallback, Undef);
++ Updater.Initialize(Phi->getType(), "");
++ Updater.AddAvailableValue(&Func->getEntryBlock(), Undef);
++ Updater.AddAvailableValue(To, Undef);
++
++ NearestCommonDominator Dominator(DT);
++ Dominator.addBlock(To, false);
++ for (BBValueVector::iterator VI = PI->second.begin(),
++ VE = PI->second.end(); VI != VE; ++VI) {
++
++ Updater.AddAvailableValue(VI->first, VI->second);
++ Dominator.addBlock(VI->first);
++ }
++
++ if (!Dominator.wasResultExplicitMentioned())
++ Updater.AddAvailableValue(Dominator.getResult(), Undef);
++
++ for (BBVector::iterator FI = From.begin(), FE = From.end();
++ FI != FE; ++FI) {
++
++ int Idx = Phi->getBasicBlockIndex(*FI);
++ assert(Idx != -1);
++ Phi->setIncomingValue(Idx, Updater.GetValueAtEndOfBlock(*FI));
++ }
++ }
++
++ DeletedPhis.erase(To);
++ }
++ assert(DeletedPhis.empty());
++}
++
++/// \brief Remove phi values from all successors and then remove the terminator.
++void AMDGPUStructurizeCFG::killTerminator(BasicBlock *BB) {
++ TerminatorInst *Term = BB->getTerminator();
++ if (!Term)
++ return;
++
++ for (succ_iterator SI = succ_begin(BB), SE = succ_end(BB);
++ SI != SE; ++SI) {
++
++ delPhiValues(BB, *SI);
++ }
++
++ Term->eraseFromParent();
++}
++
++/// \brief Let node exit(s) point to NewExit
++void AMDGPUStructurizeCFG::changeExit(RegionNode *Node, BasicBlock *NewExit,
++ bool IncludeDominator) {
++
++ if (Node->isSubRegion()) {
++ Region *SubRegion = Node->getNodeAs<Region>();
++ BasicBlock *OldExit = SubRegion->getExit();
++ BasicBlock *Dominator = 0;
++
++ // Find all the edges from the sub region to the exit
++ for (pred_iterator I = pred_begin(OldExit), E = pred_end(OldExit);
++ I != E;) {
++
++ BasicBlock *BB = *I++;
++ if (!SubRegion->contains(BB))
++ continue;
++
++ // Modify the edges to point to the new exit
++ delPhiValues(BB, OldExit);
++ BB->getTerminator()->replaceUsesOfWith(OldExit, NewExit);
++ addPhiValues(BB, NewExit);
++
++ // Find the new dominator (if requested)
++ if (IncludeDominator) {
++ if (!Dominator)
++ Dominator = BB;
++ else
++ Dominator = DT->findNearestCommonDominator(Dominator, BB);
++ }
+ }
+
-+ Phi->addIncoming(Updater.GetValueAtEndOfBlock(From), From);
++ // Change the dominator (if requested)
++ if (Dominator)
++ DT->changeImmediateDominator(NewExit, Dominator);
++
++ // Update the region info
++ SubRegion->replaceExit(NewExit);
++
++ } else {
++ BasicBlock *BB = Node->getNodeAs<BasicBlock>();
++ killTerminator(BB);
++ BranchInst::Create(NewExit, BB);
++ addPhiValues(BB, NewExit);
++ if (IncludeDominator)
++ DT->changeImmediateDominator(NewExit, BB);
+ }
-+ DeletedPhis.erase(To);
+}
+
+/// \brief Create a new flow node and update dominator tree and region info
-+BasicBlock *AMDGPUStructurizeCFG::getNextFlow(BasicBlock *Prev) {
++BasicBlock *AMDGPUStructurizeCFG::getNextFlow(BasicBlock *Dominator) {
+ LLVMContext &Context = Func->getContext();
+ BasicBlock *Insert = Order.empty() ? ParentRegion->getExit() :
+ Order.back()->getEntry();
+ BasicBlock *Flow = BasicBlock::Create(Context, FlowBlockName,
+ Func, Insert);
-+ DT->addNewBlock(Flow, Prev);
++ DT->addNewBlock(Flow, Dominator);
+ ParentRegion->getRegionInfo()->setRegionFor(Flow, ParentRegion);
-+ FlowsInserted.push_back(Flow);
+ return Flow;
+}
+
++/// \brief Create a new or reuse the previous node as flow node
++BasicBlock *AMDGPUStructurizeCFG::needPrefix(bool NeedEmpty) {
++
++ BasicBlock *Entry = PrevNode->getEntry();
++
++ if (!PrevNode->isSubRegion()) {
++ killTerminator(Entry);
++ if (!NeedEmpty || Entry->getFirstInsertionPt() == Entry->end())
++ return Entry;
++
++ }
++
++ // create a new flow node
++ BasicBlock *Flow = getNextFlow(Entry);
++
++ // and wire it up
++ changeExit(PrevNode, Flow, true);
++ PrevNode = ParentRegion->getBBNode(Flow);
++ return Flow;
++}
++
++/// \brief Returns the region exit if possible, otherwise just a new flow node
++BasicBlock *AMDGPUStructurizeCFG::needPostfix(BasicBlock *Flow,
++ bool ExitUseAllowed) {
++
++ if (Order.empty() && ExitUseAllowed) {
++ BasicBlock *Exit = ParentRegion->getExit();
++ DT->changeImmediateDominator(Exit, Flow);
++ addPhiValues(Flow, Exit);
++ return Exit;
++ }
++ return getNextFlow(Flow);
++}
++
++/// \brief Set the previous node
++void AMDGPUStructurizeCFG::setPrevNode(BasicBlock *BB) {
++ PrevNode = ParentRegion->contains(BB) ? ParentRegion->getBBNode(BB) : 0;
++}
++
++/// \brief Does BB dominate all the predicates of Node ?
++bool AMDGPUStructurizeCFG::dominatesPredicates(BasicBlock *BB, RegionNode *Node) {
++ BBPredicates &Preds = Predicates[Node->getEntry()];
++ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
++ PI != PE; ++PI) {
++
++ if (!DT->dominates(BB, PI->first))
++ return false;
++ }
++ return true;
++}
++
+/// \brief Can we predict that this node will always be called?
-+bool AMDGPUStructurizeCFG::isPredictableTrue(BasicBlock *Prev,
-+ BasicBlock *Node) {
-+ BBPredicates &Preds = Predicates[Node];
++bool AMDGPUStructurizeCFG::isPredictableTrue(RegionNode *Node) {
++
++ BBPredicates &Preds = Predicates[Node->getEntry()];
+ bool Dominated = false;
+
++ // Regionentry is always true
++ if (PrevNode == 0)
++ return true;
++
+ for (BBPredicates::iterator I = Preds.begin(), E = Preds.end();
+ I != E; ++I) {
+
+ if (I->second != BoolTrue)
+ return false;
+
-+ if (!Dominated && DT->dominates(I->first, Prev))
++ if (!Dominated && DT->dominates(I->first, PrevNode->getEntry()))
+ Dominated = true;
+ }
++
++ // TODO: The dominator check is too strict
+ return Dominated;
+}
+
-+/// \brief Wire up the new control flow by inserting or updating the branch
-+/// instructions at node exits
-+BasicBlock *AMDGPUStructurizeCFG::wireFlowBlock(BasicBlock *Prev,
-+ RegionNode *Node) {
-+ BasicBlock *Entry = Node->getEntry();
-+
-+ if (LoopStart == Entry) {
-+ LoopStart = Prev;
-+ LoopPred[Prev] = BoolTrue;
-+ }
++/// Take one node from the order vector and wire it up
++void AMDGPUStructurizeCFG::wireFlow(bool ExitUseAllowed,
++ BasicBlock *LoopEnd) {
+
-+ // Wire it up temporary, skipChained may recurse into us
-+ BranchInst::Create(Entry, Prev);
-+ DT->changeImmediateDominator(Entry, Prev);
-+ addPhiValues(Prev, Entry);
++ RegionNode *Node = Order.pop_back_val();
++ Visited.insert(Node->getEntry());
+
-+ Node = skipChained(Node);
++ if (isPredictableTrue(Node)) {
++ // Just a linear flow
++ if (PrevNode) {
++ changeExit(PrevNode, Node->getEntry(), true);
++ }
++ PrevNode = Node;
+
-+ BasicBlock *Next = getNextFlow(Prev);
-+ if (!isPredictableTrue(Prev, Entry)) {
-+ // Let Prev point to entry and next block
-+ Prev->getTerminator()->eraseFromParent();
-+ BranchInst::Create(Entry, Next, BoolUndef, Prev);
+ } else {
-+ DT->changeImmediateDominator(Next, Entry);
-+ }
++ // Insert extra prefix node (or reuse last one)
++ BasicBlock *Flow = needPrefix(false);
+
-+ // Let node exit(s) point to next block
-+ if (Node->isSubRegion()) {
-+ Region *SubRegion = Node->getNodeAs<Region>();
-+ BasicBlock *Exit = SubRegion->getExit();
++ // Insert extra postfix node (or use exit instead)
++ BasicBlock *Entry = Node->getEntry();
++ BasicBlock *Next = needPostfix(Flow, ExitUseAllowed);
+
-+ // Find all the edges from the sub region to the exit
-+ BBVector ToDo;
-+ for (pred_iterator I = pred_begin(Exit), E = pred_end(Exit); I != E; ++I) {
-+ if (SubRegion->contains(*I))
-+ ToDo.push_back(*I);
-+ }
++ // let it point to entry and next block
++ Conditions.push_back(BranchInst::Create(Entry, Next, BoolUndef, Flow));
++ addPhiValues(Flow, Entry);
++ DT->changeImmediateDominator(Entry, Flow);
+
-+ // Modify the edges to point to the new flow block
-+ for (BBVector::iterator I = ToDo.begin(), E = ToDo.end(); I != E; ++I) {
-+ delPhiValues(*I, Exit);
-+ TerminatorInst *Term = (*I)->getTerminator();
-+ Term->replaceUsesOfWith(Exit, Next);
++ PrevNode = Node;
++ while (!Order.empty() && !Visited.count(LoopEnd) &&
++ dominatesPredicates(Entry, Order.back())) {
++ handleLoops(false, LoopEnd);
+ }
+
-+ // Update the region info
-+ SubRegion->replaceExit(Next);
-+
-+ } else {
-+ BasicBlock *BB = Node->getNodeAs<BasicBlock>();
-+ killTerminator(BB);
-+ BranchInst::Create(Next, BB);
-+
-+ if (BB == LoopEnd)
-+ LoopEnd = 0;
++ changeExit(PrevNode, Next, false);
++ setPrevNode(Next);
+ }
-+
-+ return Next;
+}
+
-+/// Destroy node order and visited map, build up flow order instead.
-+/// After this function control flow looks like it should be, but
-+/// branches only have undefined conditions.
-+void AMDGPUStructurizeCFG::createFlow() {
-+ DeletedPhis.clear();
-+
-+ BasicBlock *Prev = Order.pop_back_val()->getEntry();
-+ assert(Prev == ParentRegion->getEntry() && "Incorrect node order!");
-+ Visited.erase(Prev);
-+
-+ if (LoopStart == Prev) {
-+ // Loop starts at entry, split entry so that we can predicate it
-+ BasicBlock::iterator Insert = Prev->getFirstInsertionPt();
-+ BasicBlock *Split = Prev->splitBasicBlock(Insert, FlowBlockName);
-+ DT->addNewBlock(Split, Prev);
-+ ParentRegion->getRegionInfo()->setRegionFor(Split, ParentRegion);
-+ Predicates[Split] = Predicates[Prev];
-+ Order.push_back(ParentRegion->getBBNode(Split));
-+ LoopPred[Prev] = BoolTrue;
-+
-+ } else if (LoopStart == Order.back()->getEntry()) {
-+ // Loop starts behind entry, split entry so that we can jump to it
-+ Instruction *Term = Prev->getTerminator();
-+ BasicBlock *Split = Prev->splitBasicBlock(Term, FlowBlockName);
-+ DT->addNewBlock(Split, Prev);
-+ ParentRegion->getRegionInfo()->setRegionFor(Split, ParentRegion);
-+ Prev = Split;
-+ }
-+
-+ killTerminator(Prev);
-+ FlowsInserted.clear();
-+ FlowsInserted.push_back(Prev);
++void AMDGPUStructurizeCFG::handleLoops(bool ExitUseAllowed,
++ BasicBlock *LoopEnd) {
++ RegionNode *Node = Order.back();
++ BasicBlock *LoopStart = Node->getEntry();
+
-+ while (!Order.empty()) {
-+ RegionNode *Node = Order.pop_back_val();
-+ Visited.erase(Node->getEntry());
-+ Prev = wireFlowBlock(Prev, Node);
-+ if (LoopStart && !LoopEnd) {
-+ // Create an extra loop end node
-+ LoopEnd = Prev;
-+ Prev = getNextFlow(LoopEnd);
-+ BranchInst::Create(Prev, LoopStart, BoolUndef, LoopEnd);
-+ addPhiValues(LoopEnd, LoopStart);
-+ }
++ if (!Loops.count(LoopStart)) {
++ wireFlow(ExitUseAllowed, LoopEnd);
++ return;
+ }
+
-+ BasicBlock *Exit = ParentRegion->getExit();
-+ BranchInst::Create(Exit, Prev);
-+ addPhiValues(Prev, Exit);
-+ if (DT->dominates(ParentRegion->getEntry(), Exit))
-+ DT->changeImmediateDominator(Exit, Prev);
-+
-+ if (LoopStart && LoopEnd) {
-+ BBVector::iterator FI = std::find(FlowsInserted.begin(),
-+ FlowsInserted.end(),
-+ LoopStart);
-+ for (; *FI != LoopEnd; ++FI) {
-+ addPhiValues(*FI, (*FI)->getTerminator()->getSuccessor(0));
-+ }
++ if (!isPredictableTrue(Node))
++ LoopStart = needPrefix(true);
++
++ LoopEnd = Loops[Node->getEntry()];
++ wireFlow(false, LoopEnd);
++ while (!Visited.count(LoopEnd)) {
++ handleLoops(false, LoopEnd);
+ }
+
-+ assert(Order.empty());
-+ assert(Visited.empty());
-+ assert(DeletedPhis.empty());
++ // Create an extra loop end node
++ LoopEnd = needPrefix(false);
++ BasicBlock *Next = needPostfix(LoopEnd, ExitUseAllowed);
++ LoopConds.push_back(BranchInst::Create(Next, LoopStart,
++ BoolUndef, LoopEnd));
++ addPhiValues(LoopEnd, LoopStart);
++ setPrevNode(Next);
+}
+
-+/// \brief Insert the missing branch conditions
-+void AMDGPUStructurizeCFG::insertConditions() {
-+ SSAUpdater PhiInserter;
-+
-+ for (BBVector::iterator FI = FlowsInserted.begin(), FE = FlowsInserted.end();
-+ FI != FE; ++FI) {
-+
-+ BranchInst *Term = cast<BranchInst>((*FI)->getTerminator());
-+ if (Term->isUnconditional())
-+ continue;
++/// After this function control flow looks like it should be, but
++/// branches and PHI nodes only have undefined conditions.
++void AMDGPUStructurizeCFG::createFlow() {
+
-+ PhiInserter.Initialize(Boolean, "");
-+ PhiInserter.AddAvailableValue(&Func->getEntryBlock(), BoolFalse);
++ BasicBlock *Exit = ParentRegion->getExit();
++ bool EntryDominatesExit = DT->dominates(ParentRegion->getEntry(), Exit);
+
-+ BasicBlock *Succ = Term->getSuccessor(0);
-+ BBPredicates &Preds = (*FI == LoopEnd) ? LoopPred : Predicates[Succ];
-+ for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
-+ PI != PE; ++PI) {
++ DeletedPhis.clear();
++ AddedPhis.clear();
++ Conditions.clear();
++ LoopConds.clear();
+
-+ PhiInserter.AddAvailableValue(PI->first, PI->second);
-+ }
++ PrevNode = 0;
++ Visited.clear();
+
-+ Term->setCondition(PhiInserter.GetValueAtEndOfBlock(*FI));
++ while (!Order.empty()) {
++ handleLoops(EntryDominatesExit, 0);
+ }
++
++ if (PrevNode)
++ changeExit(PrevNode, Exit, EntryDominatesExit);
++ else
++ assert(EntryDominatesExit);
+}
+
+/// Handle a rare case where the disintegrated nodes instructions
@@ -3265,14 +3626,21 @@ index 0000000..22338b5
+ orderNodes();
+ collectInfos();
+ createFlow();
-+ insertConditions();
++ insertConditions(false);
++ insertConditions(true);
++ setPhiValues();
+ rebuildSSA();
+
++ // Cleanup
+ Order.clear();
+ Visited.clear();
-+ Predicates.clear();
+ DeletedPhis.clear();
-+ FlowsInserted.clear();
++ AddedPhis.clear();
++ Predicates.clear();
++ Conditions.clear();
++ Loops.clear();
++ LoopPreds.clear();
++ LoopConds.clear();
+
+ return true;
+}
@@ -3447,10 +3815,10 @@ index 0000000..cab7884
+#endif // AMDGPUSUBTARGET_H
diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp b/lib/Target/R600/AMDGPUTargetMachine.cpp
new file mode 100644
-index 0000000..d09dc2e
+index 0000000..e2f00be
--- /dev/null
+++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
-@@ -0,0 +1,142 @@
+@@ -0,0 +1,153 @@
+//===-- AMDGPUTargetMachine.cpp - TargetMachine for hw codegen targets-----===//
+//
+// The LLVM Compiler Infrastructure
@@ -3555,6 +3923,12 @@ index 0000000..d09dc2e
+bool AMDGPUPassConfig::addInstSelector() {
+ addPass(createAMDGPUPeepholeOpt(*TM));
+ addPass(createAMDGPUISelDag(getAMDGPUTargetMachine()));
++
++ const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>();
++ if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) {
++ // This callbacks this pass uses are not implemented yet on SI.
++ addPass(createAMDGPUIndirectAddressingPass(*TM));
++ }
+ return false;
+}
+
@@ -3569,6 +3943,11 @@ index 0000000..d09dc2e
+}
+
+bool AMDGPUPassConfig::addPostRegAlloc() {
++ const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>();
++
++ if (ST.device()->getGeneration() > AMDGPUDeviceInfo::HD6XXX) {
++ addPass(createSIInsertWaits(*TM));
++ }
+ return false;
+}
+
@@ -3585,8 +3964,8 @@ index 0000000..d09dc2e
+ addPass(createAMDGPUCFGStructurizerPass(*TM));
+ addPass(createR600ExpandSpecialInstrsPass(*TM));
+ addPass(&FinalizeMachineBundlesID);
++ addPass(createR600LowerConstCopy(*TM));
+ } else {
-+ addPass(createSILowerLiteralConstantsPass(*TM));
+ addPass(createSILowerControlFlowPass(*TM));
+ }
+
@@ -3595,7 +3974,7 @@ index 0000000..d09dc2e
+
diff --git a/lib/Target/R600/AMDGPUTargetMachine.h b/lib/Target/R600/AMDGPUTargetMachine.h
new file mode 100644
-index 0000000..399e55c
+index 0000000..5a1dcf4
--- /dev/null
+++ b/lib/Target/R600/AMDGPUTargetMachine.h
@@ -0,0 +1,70 @@
@@ -3616,9 +3995,9 @@ index 0000000..399e55c
+#ifndef AMDGPU_TARGET_MACHINE_H
+#define AMDGPU_TARGET_MACHINE_H
+
++#include "AMDGPUFrameLowering.h"
+#include "AMDGPUInstrInfo.h"
+#include "AMDGPUSubtarget.h"
-+#include "AMDILFrameLowering.h"
+#include "AMDILIntrinsicInfo.h"
+#include "R600ISelLowering.h"
+#include "llvm/ADT/OwningPtr.h"
@@ -3671,10 +4050,10 @@ index 0000000..399e55c
+#endif // AMDGPU_TARGET_MACHINE_H
diff --git a/lib/Target/R600/AMDIL.h b/lib/Target/R600/AMDIL.h
new file mode 100644
-index 0000000..4e577dc
+index 0000000..b39fbdb
--- /dev/null
+++ b/lib/Target/R600/AMDIL.h
-@@ -0,0 +1,106 @@
+@@ -0,0 +1,122 @@
+//===-- AMDIL.h - Top-level interface for AMDIL representation --*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -3767,14 +4146,30 @@ index 0000000..4e577dc
+enum AddressSpaces {
+ PRIVATE_ADDRESS = 0, ///< Address space for private memory.
+ GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0).
-+ CONSTANT_ADDRESS = 2, ///< Address space for constant memory.
++ CONSTANT_ADDRESS = 2, ///< Address space for constant memory
+ LOCAL_ADDRESS = 3, ///< Address space for local memory.
+ REGION_ADDRESS = 4, ///< Address space for region memory.
+ ADDRESS_NONE = 5, ///< Address space for unknown memory.
+ PARAM_D_ADDRESS = 6, ///< Address space for direct addressible parameter memory (CONST0)
+ PARAM_I_ADDRESS = 7, ///< Address space for indirect addressible parameter memory (VTX1)
+ USER_SGPR_ADDRESS = 8, ///< Address space for USER_SGPRS on SI
-+ LAST_ADDRESS = 9
++ CONSTANT_BUFFER_0 = 9,
++ CONSTANT_BUFFER_1 = 10,
++ CONSTANT_BUFFER_2 = 11,
++ CONSTANT_BUFFER_3 = 12,
++ CONSTANT_BUFFER_4 = 13,
++ CONSTANT_BUFFER_5 = 14,
++ CONSTANT_BUFFER_6 = 15,
++ CONSTANT_BUFFER_7 = 16,
++ CONSTANT_BUFFER_8 = 17,
++ CONSTANT_BUFFER_9 = 18,
++ CONSTANT_BUFFER_10 = 19,
++ CONSTANT_BUFFER_11 = 20,
++ CONSTANT_BUFFER_12 = 21,
++ CONSTANT_BUFFER_13 = 22,
++ CONSTANT_BUFFER_14 = 23,
++ CONSTANT_BUFFER_15 = 24,
++ LAST_ADDRESS = 25
+};
+
+} // namespace AMDGPUAS
@@ -4073,10 +4468,10 @@ index 0000000..c12cedc
+
diff --git a/lib/Target/R600/AMDILCFGStructurizer.cpp b/lib/Target/R600/AMDILCFGStructurizer.cpp
new file mode 100644
-index 0000000..9de97b6
+index 0000000..568d281
--- /dev/null
+++ b/lib/Target/R600/AMDILCFGStructurizer.cpp
-@@ -0,0 +1,3049 @@
+@@ -0,0 +1,3045 @@
+//===-- AMDILCFGStructurizer.cpp - CFG Structurizer -----------------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -6101,9 +6496,7 @@ index 0000000..9de97b6
+ CFGTraits::insertAssignInstrBefore(insertPos, passRep, immReg, 1);
+ InstrT *newInstr =
+ CFGTraits::insertInstrBefore(insertPos, AMDGPU::BRANCH_COND_i32, passRep);
-+ MachineInstrBuilder MIB(*funcRep, newInstr);
-+ MIB.addMBB(loopHeader);
-+ MIB.addReg(immReg, false);
++ MachineInstrBuilder(newInstr).addMBB(loopHeader).addReg(immReg, false);
+
+ SHOWNEWINSTR(newInstr);
+
@@ -6925,12 +7318,13 @@ index 0000000..9de97b6
+ MachineInstr *oldInstr = &(*instrPos);
+ const TargetInstrInfo *tii = passRep->getTargetInstrInfo();
+ MachineBasicBlock *blk = oldInstr->getParent();
-+ MachineFunction *MF = blk->getParent();
-+ MachineInstr *newInstr = MF->CreateMachineInstr(tii->get(newOpcode), DL);
++ MachineInstr *newInstr =
++ blk->getParent()->CreateMachineInstr(tii->get(newOpcode),
++ DL);
+
+ blk->insert(instrPos, newInstr);
-+ MachineInstrBuilder MIB(*MF, newInstr);
-+ MIB.addReg(oldInstr->getOperand(1).getReg(), false);
++ MachineInstrBuilder(newInstr).addReg(oldInstr->getOperand(1).getReg(),
++ false);
+
+ SHOWNEWINSTR(newInstr);
+ //erase later oldInstr->eraseFromParent();
@@ -6943,13 +7337,13 @@ index 0000000..9de97b6
+ RegiT regNum,
+ DebugLoc DL) {
+ const TargetInstrInfo *tii = passRep->getTargetInstrInfo();
-+ MachineFunction *MF = blk->getParent();
+
-+ MachineInstr *newInstr = MF->CreateMachineInstr(tii->get(newOpcode), DL);
++ MachineInstr *newInstr =
++ blk->getParent()->CreateMachineInstr(tii->get(newOpcode), DL);
+
+ //insert before
+ blk->insert(insertPos, newInstr);
-+ MachineInstrBuilder(*MF, newInstr).addReg(regNum, false);
++ MachineInstrBuilder(newInstr).addReg(regNum, false);
+
+ SHOWNEWINSTR(newInstr);
+ } //insertCondBranchBefore
@@ -6959,12 +7353,11 @@ index 0000000..9de97b6
+ AMDGPUCFGStructurizer *passRep,
+ RegiT regNum) {
+ const TargetInstrInfo *tii = passRep->getTargetInstrInfo();
-+ MachineFunction *MF = blk->getParent();
+ MachineInstr *newInstr =
-+ MF->CreateMachineInstr(tii->get(newOpcode), DebugLoc());
++ blk->getParent()->CreateMachineInstr(tii->get(newOpcode), DebugLoc());
+
+ blk->push_back(newInstr);
-+ MachineInstrBuilder(*MF, newInstr).addReg(regNum, false);
++ MachineInstrBuilder(newInstr).addReg(regNum, false);
+
+ SHOWNEWINSTR(newInstr);
+ } //insertCondBranchEnd
@@ -7009,14 +7402,12 @@ index 0000000..9de97b6
+ RegiT src2Reg) {
+ const AMDGPUInstrInfo *tii =
+ static_cast<const AMDGPUInstrInfo *>(passRep->getTargetInstrInfo());
-+ MachineFunction *MF = blk->getParent();
+ MachineInstr *newInstr =
-+ MF->CreateMachineInstr(tii->get(tii->getIEQOpcode()), DebugLoc());
++ blk->getParent()->CreateMachineInstr(tii->get(tii->getIEQOpcode()), DebugLoc());
+
-+ MachineInstrBuilder MIB(*MF, newInstr);
-+ MIB.addReg(dstReg, RegState::Define); //set target
-+ MIB.addReg(src1Reg); //set src value
-+ MIB.addReg(src2Reg); //set src value
++ MachineInstrBuilder(newInstr).addReg(dstReg, RegState::Define); //set target
++ MachineInstrBuilder(newInstr).addReg(src1Reg); //set src value
++ MachineInstrBuilder(newInstr).addReg(src2Reg); //set src value
+
+ blk->insert(instrPos, newInstr);
+ SHOWNEWINSTR(newInstr);
@@ -7872,13 +8263,13 @@ index 0000000..6dc2deb
+
+} // namespace llvm
+#endif // AMDILEVERGREENDEVICE_H
-diff --git a/lib/Target/R600/AMDILFrameLowering.cpp b/lib/Target/R600/AMDILFrameLowering.cpp
+diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp b/lib/Target/R600/AMDILISelDAGToDAG.cpp
new file mode 100644
-index 0000000..9ad495a
+index 0000000..2e726e9
--- /dev/null
-+++ b/lib/Target/R600/AMDILFrameLowering.cpp
-@@ -0,0 +1,47 @@
-+//===----------------------- AMDILFrameLowering.cpp -----------------*- C++ -*-===//
++++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp
+@@ -0,0 +1,577 @@
++//===-- AMDILISelDAGToDAG.cpp - A dag to dag inst selector for AMDIL ------===//
+//
+// The LLVM Compiler Infrastructure
+//
@@ -7888,119 +8279,21 @@ index 0000000..9ad495a
+//==-----------------------------------------------------------------------===//
+//
+/// \file
-+/// \brief Interface to describe a layout of a stack frame on a AMDGPU target
-+/// machine.
++/// \brief Defines an instruction selector for the AMDGPU target.
+//
+//===----------------------------------------------------------------------===//
-+#include "AMDILFrameLowering.h"
-+#include "llvm/CodeGen/MachineFrameInfo.h"
-+
-+using namespace llvm;
-+AMDGPUFrameLowering::AMDGPUFrameLowering(StackDirection D, unsigned StackAl,
-+ int LAO, unsigned TransAl)
-+ : TargetFrameLowering(D, StackAl, LAO, TransAl) {
-+}
-+
-+AMDGPUFrameLowering::~AMDGPUFrameLowering() {
-+}
-+
-+int AMDGPUFrameLowering::getFrameIndexOffset(const MachineFunction &MF,
-+ int FI) const {
-+ const MachineFrameInfo *MFI = MF.getFrameInfo();
-+ return MFI->getObjectOffset(FI);
-+}
-+
-+const TargetFrameLowering::SpillSlot *
-+AMDGPUFrameLowering::getCalleeSavedSpillSlots(unsigned &NumEntries) const {
-+ NumEntries = 0;
-+ return 0;
-+}
-+void
-+AMDGPUFrameLowering::emitPrologue(MachineFunction &MF) const {
-+}
-+void
-+AMDGPUFrameLowering::emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const {
-+}
-+bool
-+AMDGPUFrameLowering::hasFP(const MachineFunction &MF) const {
-+ return false;
-+}
-diff --git a/lib/Target/R600/AMDILFrameLowering.h b/lib/Target/R600/AMDILFrameLowering.h
-new file mode 100644
-index 0000000..51337c3
---- /dev/null
-+++ b/lib/Target/R600/AMDILFrameLowering.h
-@@ -0,0 +1,40 @@
-+//===--------------------- AMDILFrameLowering.h -----------------*- C++ -*-===//
-+//
-+// The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//===----------------------------------------------------------------------===//
-+//
-+/// \file
-+/// \brief Interface to describe a layout of a stack frame on a AMDIL target
-+/// machine.
-+//
-+//===----------------------------------------------------------------------===//
-+#ifndef AMDILFRAME_LOWERING_H
-+#define AMDILFRAME_LOWERING_H
-+
-+#include "llvm/CodeGen/MachineFunction.h"
-+#include "llvm/Target/TargetFrameLowering.h"
-+
-+namespace llvm {
-+
-+/// \brief Information about the stack frame layout on the AMDGPU targets.
-+///
-+/// It holds the direction of the stack growth, the known stack alignment on
-+/// entry to each function, and the offset to the locals area.
-+/// See TargetFrameInfo for more comments.
-+class AMDGPUFrameLowering : public TargetFrameLowering {
-+public:
-+ AMDGPUFrameLowering(StackDirection D, unsigned StackAl, int LAO,
-+ unsigned TransAl = 1);
-+ virtual ~AMDGPUFrameLowering();
-+ virtual int getFrameIndexOffset(const MachineFunction &MF, int FI) const;
-+ virtual const SpillSlot *getCalleeSavedSpillSlots(unsigned &NumEntries) const;
-+ virtual void emitPrologue(MachineFunction &MF) const;
-+ virtual void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const;
-+ virtual bool hasFP(const MachineFunction &MF) const;
-+};
-+} // namespace llvm
-+#endif // AMDILFRAME_LOWERING_H
-diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp b/lib/Target/R600/AMDILISelDAGToDAG.cpp
-new file mode 100644
-index 0000000..d15ed39
---- /dev/null
-+++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp
-@@ -0,0 +1,485 @@
-+//===-- AMDILISelDAGToDAG.cpp - A dag to dag inst selector for AMDIL ------===//
-+//
-+// The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//==-----------------------------------------------------------------------===//
-+//
-+/// \file
-+/// \brief Defines an instruction selector for the AMDGPU target.
-+//
-+//===----------------------------------------------------------------------===//
-+#include "AMDGPUInstrInfo.h"
-+#include "AMDGPUISelLowering.h" // For AMDGPUISD
-+#include "AMDGPURegisterInfo.h"
-+#include "AMDILDevices.h"
-+#include "R600InstrInfo.h"
-+#include "llvm/ADT/ValueMap.h"
-+#include "llvm/CodeGen/PseudoSourceValue.h"
-+#include "llvm/CodeGen/SelectionDAGISel.h"
-+#include "llvm/Support/Compiler.h"
-+#include <list>
-+#include <queue>
++#include "AMDGPUInstrInfo.h"
++#include "AMDGPUISelLowering.h" // For AMDGPUISD
++#include "AMDGPURegisterInfo.h"
++#include "AMDILDevices.h"
++#include "R600InstrInfo.h"
++#include "llvm/ADT/ValueMap.h"
++#include "llvm/CodeGen/PseudoSourceValue.h"
++#include "llvm/CodeGen/SelectionDAGISel.h"
++#include "llvm/Support/Compiler.h"
++#include "llvm/CodeGen/SelectionDAG.h"
++#include <list>
++#include <queue>
+
+using namespace llvm;
+
@@ -8024,6 +8317,7 @@ index 0000000..d15ed39
+
+private:
+ inline SDValue getSmallIPtrImm(unsigned Imm);
++ bool FoldOperands(unsigned, const R600InstrInfo *, std::vector<SDValue> &);
+
+ // Complex pattern selectors
+ bool SelectADDRParam(SDValue Addr, SDValue& R1, SDValue& R2);
@@ -8046,9 +8340,11 @@ index 0000000..d15ed39
+ static bool isLocalLoad(const LoadSDNode *N);
+ static bool isRegionLoad(const LoadSDNode *N);
+
-+ bool SelectADDR8BitOffset(SDValue Addr, SDValue& Base, SDValue& Offset);
-+ bool SelectADDRReg(SDValue Addr, SDValue& Base, SDValue& Offset);
++ bool SelectGlobalValueConstantOffset(SDValue Addr, SDValue& IntPtr);
++ bool SelectGlobalValueVariableOffset(SDValue Addr,
++ SDValue &BaseReg, SDValue& Offset);
+ bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base, SDValue &Offset);
++ bool SelectADDRIndirect(SDValue Addr, SDValue &Base, SDValue &Offset);
+
+ // Include the pieces autogenerated from the target description.
+#include "AMDGPUGenDAGISel.inc"
@@ -8135,16 +8431,6 @@ index 0000000..d15ed39
+ }
+ switch (Opc) {
+ default: break;
-+ case ISD::FrameIndex: {
-+ if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(N)) {
-+ unsigned int FI = FIN->getIndex();
-+ EVT OpVT = N->getValueType(0);
-+ unsigned int NewOpc = AMDGPU::COPY;
-+ SDValue TFI = CurDAG->getTargetFrameIndex(FI, MVT::i32);
-+ return CurDAG->SelectNodeTo(N, NewOpc, OpVT, TFI);
-+ }
-+ break;
-+ }
+ case ISD::ConstantFP:
+ case ISD::Constant: {
+ const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>();
@@ -8203,7 +8489,9 @@ index 0000000..d15ed39
+ continue;
+ }
+ } else {
-+ if (!TII->isALUInstr(Use->getMachineOpcode())) {
++ if (!TII->isALUInstr(Use->getMachineOpcode()) ||
++ (TII->get(Use->getMachineOpcode()).TSFlags &
++ R600_InstFlag::VECTOR)) {
+ continue;
+ }
+
@@ -8238,7 +8526,116 @@ index 0000000..d15ed39
+ break;
+ }
+ }
-+ return SelectCode(N);
++ SDNode *Result = SelectCode(N);
++
++ // Fold operands of selected node
++
++ const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>();
++ if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) {
++ const R600InstrInfo *TII =
++ static_cast<const R600InstrInfo*>(TM.getInstrInfo());
++ if (Result && Result->isMachineOpcode() &&
++ !(TII->get(Result->getMachineOpcode()).TSFlags & R600_InstFlag::VECTOR)
++ && TII->isALUInstr(Result->getMachineOpcode())) {
++ // Fold FNEG/FABS/CONST_ADDRESS
++ // TODO: Isel can generate multiple MachineInst, we need to recursively
++ // parse Result
++ bool IsModified = false;
++ do {
++ std::vector<SDValue> Ops;
++ for(SDNode::op_iterator I = Result->op_begin(), E = Result->op_end();
++ I != E; ++I)
++ Ops.push_back(*I);
++ IsModified = FoldOperands(Result->getMachineOpcode(), TII, Ops);
++ if (IsModified) {
++ Result = CurDAG->UpdateNodeOperands(Result, Ops.data(), Ops.size());
++ }
++ } while (IsModified);
++
++ // If node has a single use which is CLAMP_R600, folds it
++ if (Result->hasOneUse() && Result->isMachineOpcode()) {
++ SDNode *PotentialClamp = *Result->use_begin();
++ if (PotentialClamp->isMachineOpcode() &&
++ PotentialClamp->getMachineOpcode() == AMDGPU::CLAMP_R600) {
++ unsigned ClampIdx =
++ TII->getOperandIdx(Result->getMachineOpcode(), R600Operands::CLAMP);
++ std::vector<SDValue> Ops;
++ unsigned NumOp = Result->getNumOperands();
++ for (unsigned i = 0; i < NumOp; ++i) {
++ Ops.push_back(Result->getOperand(i));
++ }
++ Ops[ClampIdx - 1] = CurDAG->getTargetConstant(1, MVT::i32);
++ Result = CurDAG->SelectNodeTo(PotentialClamp,
++ Result->getMachineOpcode(), PotentialClamp->getVTList(),
++ Ops.data(), NumOp);
++ }
++ }
++ }
++ }
++
++ return Result;
++}
++
++bool AMDGPUDAGToDAGISel::FoldOperands(unsigned Opcode,
++ const R600InstrInfo *TII, std::vector<SDValue> &Ops) {
++ int OperandIdx[] = {
++ TII->getOperandIdx(Opcode, R600Operands::SRC0),
++ TII->getOperandIdx(Opcode, R600Operands::SRC1),
++ TII->getOperandIdx(Opcode, R600Operands::SRC2)
++ };
++ int SelIdx[] = {
++ TII->getOperandIdx(Opcode, R600Operands::SRC0_SEL),
++ TII->getOperandIdx(Opcode, R600Operands::SRC1_SEL),
++ TII->getOperandIdx(Opcode, R600Operands::SRC2_SEL)
++ };
++ int NegIdx[] = {
++ TII->getOperandIdx(Opcode, R600Operands::SRC0_NEG),
++ TII->getOperandIdx(Opcode, R600Operands::SRC1_NEG),
++ TII->getOperandIdx(Opcode, R600Operands::SRC2_NEG)
++ };
++ int AbsIdx[] = {
++ TII->getOperandIdx(Opcode, R600Operands::SRC0_ABS),
++ TII->getOperandIdx(Opcode, R600Operands::SRC1_ABS),
++ -1
++ };
++
++ for (unsigned i = 0; i < 3; i++) {
++ if (OperandIdx[i] < 0)
++ return false;
++ SDValue Operand = Ops[OperandIdx[i] - 1];
++ switch (Operand.getOpcode()) {
++ case AMDGPUISD::CONST_ADDRESS: {
++ if (i == 2)
++ break;
++ SDValue CstOffset;
++ if (!Operand.getValueType().isVector() &&
++ SelectGlobalValueConstantOffset(Operand.getOperand(0), CstOffset)) {
++ Ops[OperandIdx[i] - 1] = CurDAG->getRegister(AMDGPU::ALU_CONST, MVT::f32);
++ Ops[SelIdx[i] - 1] = CstOffset;
++ return true;
++ }
++ }
++ break;
++ case ISD::FNEG:
++ if (NegIdx[i] < 0)
++ break;
++ Ops[OperandIdx[i] - 1] = Operand.getOperand(0);
++ Ops[NegIdx[i] - 1] = CurDAG->getTargetConstant(1, MVT::i32);
++ return true;
++ case ISD::FABS:
++ if (AbsIdx[i] < 0)
++ break;
++ Ops[OperandIdx[i] - 1] = Operand.getOperand(0);
++ Ops[AbsIdx[i] - 1] = CurDAG->getTargetConstant(1, MVT::i32);
++ return true;
++ case ISD::BITCAST:
++ Ops[OperandIdx[i] - 1] = Operand.getOperand(0);
++ return true;
++ default:
++ break;
++ }
++ }
++ return false;
+}
+
+bool AMDGPUDAGToDAGISel::checkType(const Value *ptr, unsigned int addrspace) {
@@ -8385,41 +8782,23 @@ index 0000000..d15ed39
+
+///==== AMDGPU Functions ====///
+
-+bool AMDGPUDAGToDAGISel::SelectADDR8BitOffset(SDValue Addr, SDValue& Base,
-+ SDValue& Offset) {
-+ if (Addr.getOpcode() == ISD::TargetExternalSymbol ||
-+ Addr.getOpcode() == ISD::TargetGlobalAddress) {
-+ return false;
++bool AMDGPUDAGToDAGISel::SelectGlobalValueConstantOffset(SDValue Addr,
++ SDValue& IntPtr) {
++ if (ConstantSDNode *Cst = dyn_cast<ConstantSDNode>(Addr)) {
++ IntPtr = CurDAG->getIntPtrConstant(Cst->getZExtValue() / 4, true);
++ return true;
+ }
++ return false;
++}
+
-+
-+ if (Addr.getOpcode() == ISD::ADD) {
-+ bool Match = false;
-+
-+ // Find the base ptr and the offset
-+ for (unsigned i = 0; i < Addr.getNumOperands(); i++) {
-+ SDValue Arg = Addr.getOperand(i);
-+ ConstantSDNode * OffsetNode = dyn_cast<ConstantSDNode>(Arg);
-+ // This arg isn't a constant so it must be the base PTR.
-+ if (!OffsetNode) {
-+ Base = Addr.getOperand(i);
-+ continue;
-+ }
-+ // Check if the constant argument fits in 8-bits. The offset is in bytes
-+ // so we need to convert it to dwords.
-+ if (isUInt<8>(OffsetNode->getZExtValue() >> 2)) {
-+ Match = true;
-+ Offset = CurDAG->getTargetConstant(OffsetNode->getZExtValue() >> 2,
-+ MVT::i32);
-+ }
-+ }
-+ return Match;
++bool AMDGPUDAGToDAGISel::SelectGlobalValueVariableOffset(SDValue Addr,
++ SDValue& BaseReg, SDValue &Offset) {
++ if (!dyn_cast<ConstantSDNode>(Addr)) {
++ BaseReg = Addr;
++ Offset = CurDAG->getIntPtrConstant(0, true);
++ return true;
+ }
-+
-+ // Default case, no offset
-+ Base = Addr;
-+ Offset = CurDAG->getTargetConstant(0, MVT::i32);
-+ return true;
++ return false;
+}
+
+bool AMDGPUDAGToDAGISel::SelectADDRVTX_READ(SDValue Addr, SDValue &Base,
@@ -8449,16 +8828,21 @@ index 0000000..d15ed39
+ return true;
+}
+
-+bool AMDGPUDAGToDAGISel::SelectADDRReg(SDValue Addr, SDValue& Base,
-+ SDValue& Offset) {
-+ if (Addr.getOpcode() == ISD::TargetExternalSymbol ||
-+ Addr.getOpcode() == ISD::TargetGlobalAddress ||
-+ Addr.getOpcode() != ISD::ADD) {
-+ return false;
-+ }
++bool AMDGPUDAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,
++ SDValue &Offset) {
++ ConstantSDNode *C;
+
-+ Base = Addr.getOperand(0);
-+ Offset = Addr.getOperand(1);
++ if ((C = dyn_cast<ConstantSDNode>(Addr))) {
++ Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32);
++ Offset = CurDAG->getTargetConstant(C->getZExtValue(), MVT::i32);
++ } else if ((Addr.getOpcode() == ISD::ADD || Addr.getOpcode() == ISD::OR) &&
++ (C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {
++ Base = Addr.getOperand(0);
++ Offset = CurDAG->getTargetConstant(C->getZExtValue(), MVT::i32);
++ } else {
++ Base = Addr;
++ Offset = CurDAG->getTargetConstant(0, MVT::i32);
++ }
+
+ return true;
+}
@@ -9857,10 +10241,10 @@ index 0000000..bc7df37
+#endif // AMDILNIDEVICE_H
diff --git a/lib/Target/R600/AMDILPeepholeOptimizer.cpp b/lib/Target/R600/AMDILPeepholeOptimizer.cpp
new file mode 100644
-index 0000000..4a748b8
+index 0000000..57317ac
--- /dev/null
+++ b/lib/Target/R600/AMDILPeepholeOptimizer.cpp
-@@ -0,0 +1,1215 @@
+@@ -0,0 +1,1256 @@
+//===-- AMDILPeepholeOptimizer.cpp - AMDGPU Peephole optimizations ---------===//
+//
+// The LLVM Compiler Infrastructure
@@ -10409,14 +10793,51 @@ index 0000000..4a748b8
+ lhsMaskOffset = lhsMaskVal ? CountTrailingZeros_32(lhsMaskVal) : lhsShiftVal;
+ rhsMaskOffset = rhsMaskVal ? CountTrailingZeros_32(rhsMaskVal) : rhsShiftVal;
+ // TODO: Handle the case of A & B | D & ~B(i.e. inverted masks).
++ if (mDebug) {
++ dbgs() << "Found pattern: \'((A" << (LHSMask ? " & B)" : ")");
++ dbgs() << (LHSShift ? " << C)" : ")") << " | ((D" ;
++ dbgs() << (RHSMask ? " & E)" : ")");
++ dbgs() << (RHSShift ? " << F)\'\n" : ")\'\n");
++ dbgs() << "A = LHSSrc\t\tD = RHSSrc \n";
++ dbgs() << "B = " << lhsMaskVal << "\t\tE = " << rhsMaskVal << "\n";
++ dbgs() << "C = " << lhsShiftVal << "\t\tF = " << rhsShiftVal << "\n";
++ dbgs() << "width(B) = " << lhsMaskWidth;
++ dbgs() << "\twidth(E) = " << rhsMaskWidth << "\n";
++ dbgs() << "offset(B) = " << lhsMaskOffset;
++ dbgs() << "\toffset(E) = " << rhsMaskOffset << "\n";
++ dbgs() << "Constraints: \n";
++ dbgs() << "\t(1) B ^ E == 0\n";
++ dbgs() << "\t(2-LHS) B is a mask\n";
++ dbgs() << "\t(2-LHS) E is a mask\n";
++ dbgs() << "\t(3-LHS) (offset(B)) >= (width(E) + offset(E))\n";
++ dbgs() << "\t(3-RHS) (offset(E)) >= (width(B) + offset(B))\n";
++ }
+ if ((lhsMaskVal || rhsMaskVal) && !(lhsMaskVal ^ rhsMaskVal)) {
++ if (mDebug) {
++ dbgs() << lhsMaskVal << " ^ " << rhsMaskVal;
++ dbgs() << " = " << (lhsMaskVal ^ rhsMaskVal) << "\n";
++ dbgs() << "Failed constraint 1!\n";
++ }
+ return false;
+ }
++ if (mDebug) {
++ dbgs() << "LHS = " << lhsMaskOffset << "";
++ dbgs() << " >= (" << rhsMaskWidth << " + " << rhsMaskOffset << ") = ";
++ dbgs() << (lhsMaskOffset >= (rhsMaskWidth + rhsMaskOffset));
++ dbgs() << "\nRHS = " << rhsMaskOffset << "";
++ dbgs() << " >= (" << lhsMaskWidth << " + " << lhsMaskOffset << ") = ";
++ dbgs() << (rhsMaskOffset >= (lhsMaskWidth + lhsMaskOffset));
++ dbgs() << "\n";
++ }
+ if (lhsMaskOffset >= (rhsMaskWidth + rhsMaskOffset)) {
+ offset = ConstantInt::get(aType, lhsMaskOffset, false);
+ width = ConstantInt::get(aType, lhsMaskWidth, false);
+ RHSSrc = RHS;
+ if (!isMask_32(lhsMaskVal) && !isShiftedMask_32(lhsMaskVal)) {
++ if (mDebug) {
++ dbgs() << "Value is not a Mask: " << lhsMaskVal << "\n";
++ dbgs() << "Failed constraint 2!\n";
++ }
+ return false;
+ }
+ if (!LHSShift) {
@@ -10435,6 +10856,10 @@ index 0000000..4a748b8
+ LHSSrc = RHSSrc;
+ RHSSrc = LHS;
+ if (!isMask_32(rhsMaskVal) && !isShiftedMask_32(rhsMaskVal)) {
++ if (mDebug) {
++ dbgs() << "Non-Mask: " << rhsMaskVal << "\n";
++ dbgs() << "Failed constraint 2!\n";
++ }
+ return false;
+ }
+ if (!RHSShift) {
@@ -11287,10 +11712,10 @@ index 0000000..5b2cb25
+#endif // AMDILSIDEVICE_H
diff --git a/lib/Target/R600/CMakeLists.txt b/lib/Target/R600/CMakeLists.txt
new file mode 100644
-index 0000000..ce0b56b
+index 0000000..8ef9f8c
--- /dev/null
+++ b/lib/Target/R600/CMakeLists.txt
-@@ -0,0 +1,55 @@
+@@ -0,0 +1,56 @@
+set(LLVM_TARGET_DEFINITIONS AMDGPU.td)
+
+tablegen(LLVM AMDGPUGenRegisterInfo.inc -gen-register-info)
@@ -11304,7 +11729,7 @@ index 0000000..ce0b56b
+tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)
+add_public_tablegen_target(AMDGPUCommonTableGen)
+
-+add_llvm_target(R600CodeGen
++add_llvm_target(AMDGPUCodeGen
+ AMDIL7XXDevice.cpp
+ AMDILCFGStructurizer.cpp
+ AMDILDevice.cpp
@@ -11318,9 +11743,9 @@ index 0000000..ce0b56b
+ AMDILPeepholeOptimizer.cpp
+ AMDILSIDevice.cpp
+ AMDGPUAsmPrinter.cpp
++ AMDGPUIndirectAddressing.cpp
+ AMDGPUMCInstLower.cpp
+ AMDGPUSubtarget.cpp
-+ AMDGPUStructurizeCFG.cpp
+ AMDGPUTargetMachine.cpp
+ AMDGPUISelLowering.cpp
+ AMDGPUConvertToISA.cpp
@@ -11329,9 +11754,9 @@ index 0000000..ce0b56b
+ R600ExpandSpecialInstrs.cpp
+ R600InstrInfo.cpp
+ R600ISelLowering.cpp
++ R600LowerConstCopy.cpp
+ R600MachineFunctionInfo.cpp
+ R600RegisterInfo.cpp
-+ SIAnnotateControlFlow.cpp
+ SIAssignInterpRegs.cpp
+ SIInstrInfo.cpp
+ SIISelLowering.cpp
@@ -11339,6 +11764,7 @@ index 0000000..ce0b56b
+ SILowerControlFlow.cpp
+ SIMachineFunctionInfo.cpp
+ SIRegisterInfo.cpp
++ SIFixSGPRLiveness.cpp
+ )
+
+add_dependencies(LLVMR600CodeGen intrinsics_gen)
@@ -11348,10 +11774,10 @@ index 0000000..ce0b56b
+add_subdirectory(MCTargetDesc)
diff --git a/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp
new file mode 100644
-index 0000000..e6c550b
+index 0000000..d6450a0
--- /dev/null
+++ b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp
-@@ -0,0 +1,132 @@
+@@ -0,0 +1,168 @@
+//===-- AMDGPUInstPrinter.cpp - AMDGPU MC Inst -> ASM ---------------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -11394,6 +11820,21 @@ index 0000000..e6c550b
+ }
+}
+
++void AMDGPUInstPrinter::printInterpSlot(const MCInst *MI, unsigned OpNum,
++ raw_ostream &O) {
++ unsigned Imm = MI->getOperand(OpNum).getImm();
++
++ if (Imm == 2) {
++ O << "P0";
++ } else if (Imm == 1) {
++ O << "P20";
++ } else if (Imm == 0) {
++ O << "P10";
++ } else {
++ assert(!"Invalid interpolation parameter slot");
++ }
++}
++
+void AMDGPUInstPrinter::printMemOperand(const MCInst *MI, unsigned OpNo,
+ raw_ostream &O) {
+ printOperand(MI, OpNo, O);
@@ -11459,10 +11900,7 @@ index 0000000..e6c550b
+
+void AMDGPUInstPrinter::printRel(const MCInst *MI, unsigned OpNo,
+ raw_ostream &O) {
-+ const MCOperand &Op = MI->getOperand(OpNo);
-+ if (Op.getImm() != 0) {
-+ O << " + " << Op.getImm();
-+ }
++ printIfSet(MI, OpNo, O, "+");
+}
+
+void AMDGPUInstPrinter::printUpdateExecMask(const MCInst *MI, unsigned OpNo,
@@ -11483,13 +11921,37 @@ index 0000000..e6c550b
+ }
+}
+
++void AMDGPUInstPrinter::printSel(const MCInst *MI, unsigned OpNo,
++ raw_ostream &O) {
++ const char * chans = "XYZW";
++ int sel = MI->getOperand(OpNo).getImm();
++
++ int chan = sel & 3;
++ sel >>= 2;
++
++ if (sel >= 512) {
++ sel -= 512;
++ int cb = sel >> 12;
++ sel &= 4095;
++ O << cb << "[" << sel << "]";
++ } else if (sel >= 448) {
++ sel -= 448;
++ O << sel;
++ } else if (sel >= 0){
++ O << sel;
++ }
++
++ if (sel >= 0)
++ O << "." << chans[chan];
++}
++
+#include "AMDGPUGenAsmWriter.inc"
diff --git a/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h
new file mode 100644
-index 0000000..96e0e46
+index 0000000..767a708
--- /dev/null
+++ b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h
-@@ -0,0 +1,52 @@
+@@ -0,0 +1,54 @@
+//===-- AMDGPUInstPrinter.h - AMDGPU MC Inst -> ASM interface ---*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -11525,6 +11987,7 @@ index 0000000..96e0e46
+
+private:
+ void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
++ void printInterpSlot(const MCInst *MI, unsigned OpNum, raw_ostream &O);
+ void printMemOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
+ void printIfSet(const MCInst *MI, unsigned OpNo, raw_ostream &O, StringRef Asm);
+ void printAbs(const MCInst *MI, unsigned OpNo, raw_ostream &O);
@@ -11537,6 +12000,7 @@ index 0000000..96e0e46
+ void printUpdateExecMask(const MCInst *MI, unsigned OpNo, raw_ostream &O);
+ void printUpdatePred(const MCInst *MI, unsigned OpNo, raw_ostream &O);
+ void printWrite(const MCInst *MI, unsigned OpNo, raw_ostream &O);
++ void printSel(const MCInst *MI, unsigned OpNo, raw_ostream &O);
+};
+
+} // End namespace llvm
@@ -11544,7 +12008,7 @@ index 0000000..96e0e46
+#endif // AMDGPUINSTRPRINTER_H
diff --git a/lib/Target/R600/InstPrinter/CMakeLists.txt b/lib/Target/R600/InstPrinter/CMakeLists.txt
new file mode 100644
-index 0000000..069c55b
+index 0000000..6776337
--- /dev/null
+++ b/lib/Target/R600/InstPrinter/CMakeLists.txt
@@ -0,0 +1,7 @@
@@ -11554,7 +12018,7 @@ index 0000000..069c55b
+ AMDGPUInstPrinter.cpp
+ )
+
-+add_dependencies(LLVMR600AsmPrinter AMDGPUCommonTableGen)
++add_dependencies(LLVMR600AsmPrinter R600CommonTableGen)
diff --git a/lib/Target/R600/InstPrinter/LLVMBuild.txt b/lib/Target/R600/InstPrinter/LLVMBuild.txt
new file mode 100644
index 0000000..ec0be89
@@ -11869,10 +12333,10 @@ index 0000000..3ad0fa6
+#endif // AMDGPUMCASMINFO_H
diff --git a/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h b/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h
new file mode 100644
-index 0000000..9d0d6cf
+index 0000000..8721f80
--- /dev/null
+++ b/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h
-@@ -0,0 +1,60 @@
+@@ -0,0 +1,49 @@
+//===-- AMDGPUCodeEmitter.h - AMDGPU Code Emitter interface -----------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -11917,17 +12381,6 @@ index 0000000..9d0d6cf
+ SmallVectorImpl<MCFixup> &Fixups) const {
+ return 0;
+ }
-+ virtual uint64_t VOPPostEncode(const MCInst &MI, uint64_t Value) const {
-+ return Value;
-+ }
-+ virtual uint64_t i32LiteralEncode(const MCInst &MI, unsigned OpNo,
-+ SmallVectorImpl<MCFixup> &Fixups) const {
-+ return 0;
-+ }
-+ virtual uint32_t SMRDmemriEncode(const MCInst &MI, unsigned OpNo,
-+ SmallVectorImpl<MCFixup> &Fixups) const {
-+ return 0;
-+ }
+};
+
+} // End namespace llvm
@@ -12182,10 +12635,10 @@ index 0000000..8894a76
+include $(LEVEL)/Makefile.common
diff --git a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
new file mode 100644
-index 0000000..dc91924
+index 0000000..115fe8d
--- /dev/null
+++ b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
-@@ -0,0 +1,575 @@
+@@ -0,0 +1,582 @@
+//===- R600MCCodeEmitter.cpp - Code Emitter for R600->Cayman GPU families -===//
+//
+// The LLVM Compiler Infrastructure
@@ -12252,8 +12705,8 @@ index 0000000..dc91924
+ void EmitALUInstr(const MCInst &MI, SmallVectorImpl<MCFixup> &Fixups,
+ raw_ostream &OS) const;
+ void EmitSrc(const MCInst &MI, unsigned OpIdx, raw_ostream &OS) const;
-+ void EmitSrcISA(const MCInst &MI, unsigned OpIdx, uint64_t &Value,
-+ raw_ostream &OS) const;
++ void EmitSrcISA(const MCInst &MI, unsigned RegOpIdx, unsigned SelOpIdx,
++ raw_ostream &OS) const;
+ void EmitDst(const MCInst &MI, raw_ostream &OS) const;
+ void EmitTexInstr(const MCInst &MI, SmallVectorImpl<MCFixup> &Fixups,
+ raw_ostream &OS) const;
@@ -12350,9 +12803,12 @@ index 0000000..dc91924
+ case AMDGPU::VTX_READ_PARAM_8_eg:
+ case AMDGPU::VTX_READ_PARAM_16_eg:
+ case AMDGPU::VTX_READ_PARAM_32_eg:
++ case AMDGPU::VTX_READ_PARAM_128_eg:
+ case AMDGPU::VTX_READ_GLOBAL_8_eg:
+ case AMDGPU::VTX_READ_GLOBAL_32_eg:
-+ case AMDGPU::VTX_READ_GLOBAL_128_eg: {
++ case AMDGPU::VTX_READ_GLOBAL_128_eg:
++ case AMDGPU::TEX_VTX_CONSTBUF:
++ case AMDGPU::TEX_VTX_TEXBUF : {
+ uint64_t InstWord01 = getBinaryCodeForInstr(MI, Fixups);
+ uint32_t InstWord2 = MI.getOperand(2).getImm(); // Offset
+
@@ -12382,7 +12838,6 @@ index 0000000..dc91924
+ SmallVectorImpl<MCFixup> &Fixups,
+ raw_ostream &OS) const {
+ const MCInstrDesc &MCDesc = MCII.get(MI.getOpcode());
-+ unsigned NumOperands = MI.getNumOperands();
+
+ // Emit instruction type
+ EmitByte(INSTR_ALU, OS);
@@ -12398,19 +12853,21 @@ index 0000000..dc91924
+ InstWord01 |= ISAOpCode << 1;
+ }
+
-+ unsigned SrcIdx = 0;
-+ for (unsigned int OpIdx = 1; OpIdx < NumOperands; ++OpIdx) {
-+ if (MI.getOperand(OpIdx).isImm() || MI.getOperand(OpIdx).isFPImm() ||
-+ OpIdx == (unsigned)MCDesc.findFirstPredOperandIdx()) {
-+ continue;
-+ }
-+ EmitSrcISA(MI, OpIdx, InstWord01, OS);
-+ SrcIdx++;
-+ }
++ unsigned SrcNum = MCDesc.TSFlags & R600_InstFlag::OP3 ? 3 :
++ MCDesc.TSFlags & R600_InstFlag::OP2 ? 2 : 1;
++
++ EmitByte(SrcNum, OS);
++
++ const unsigned SrcOps[3][2] = {
++ {R600Operands::SRC0, R600Operands::SRC0_SEL},
++ {R600Operands::SRC1, R600Operands::SRC1_SEL},
++ {R600Operands::SRC2, R600Operands::SRC2_SEL}
++ };
+
-+ // Emit zeros for unused sources
-+ for ( ; SrcIdx < 3; SrcIdx++) {
-+ EmitNullBytes(SRC_BYTE_COUNT - 6, OS);
++ for (unsigned SrcIdx = 0; SrcIdx < SrcNum; ++SrcIdx) {
++ unsigned RegOpIdx = R600Operands::ALUOpTable[SrcNum-1][SrcOps[SrcIdx][0]];
++ unsigned SelOpIdx = R600Operands::ALUOpTable[SrcNum-1][SrcOps[SrcIdx][1]];
++ EmitSrcISA(MI, RegOpIdx, SelOpIdx, OS);
+ }
+
+ Emit(InstWord01, OS);
@@ -12481,34 +12938,37 @@ index 0000000..dc91924
+
+}
+
-+void R600MCCodeEmitter::EmitSrcISA(const MCInst &MI, unsigned OpIdx,
-+ uint64_t &Value, raw_ostream &OS) const {
-+ const MCOperand &MO = MI.getOperand(OpIdx);
++void R600MCCodeEmitter::EmitSrcISA(const MCInst &MI, unsigned RegOpIdx,
++ unsigned SelOpIdx, raw_ostream &OS) const {
++ const MCOperand &RegMO = MI.getOperand(RegOpIdx);
++ const MCOperand &SelMO = MI.getOperand(SelOpIdx);
++
+ union {
+ float f;
+ uint32_t i;
+ } InlineConstant;
+ InlineConstant.i = 0;
-+ // Emit the source select (2 bytes). For GPRs, this is the register index.
-+ // For other potential instruction operands, (e.g. constant registers) the
-+ // value of the source select is defined in the r600isa docs.
-+ if (MO.isReg()) {
-+ unsigned Reg = MO.getReg();
-+ if (AMDGPUMCRegisterClasses[AMDGPU::R600_CReg32RegClassID].contains(Reg)) {
-+ EmitByte(1, OS);
-+ } else {
-+ EmitByte(0, OS);
-+ }
++ // Emit source type (1 byte) and source select (4 bytes). For GPRs type is 0
++ // and select is 0 (GPR index is encoded in the instr encoding. For constants
++ // type is 1 and select is the original const select passed from the driver.
++ unsigned Reg = RegMO.getReg();
++ if (Reg == AMDGPU::ALU_CONST) {
++ EmitByte(1, OS);
++ uint32_t Sel = SelMO.getImm();
++ Emit(Sel, OS);
++ } else {
++ EmitByte(0, OS);
++ Emit((uint32_t)0, OS);
++ }
+
-+ if (Reg == AMDGPU::ALU_LITERAL_X) {
-+ unsigned ImmOpIndex = MI.getNumOperands() - 1;
-+ MCOperand ImmOp = MI.getOperand(ImmOpIndex);
-+ if (ImmOp.isFPImm()) {
-+ InlineConstant.f = ImmOp.getFPImm();
-+ } else {
-+ assert(ImmOp.isImm());
-+ InlineConstant.i = ImmOp.getImm();
-+ }
++ if (Reg == AMDGPU::ALU_LITERAL_X) {
++ unsigned ImmOpIndex = MI.getNumOperands() - 1;
++ MCOperand ImmOp = MI.getOperand(ImmOpIndex);
++ if (ImmOp.isFPImm()) {
++ InlineConstant.f = ImmOp.getFPImm();
++ } else {
++ assert(ImmOp.isImm());
++ InlineConstant.i = ImmOp.getImm();
+ }
+ }
+
@@ -12763,10 +13223,10 @@ index 0000000..dc91924
+#include "AMDGPUGenMCCodeEmitter.inc"
diff --git a/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp b/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp
new file mode 100644
-index 0000000..c47dc99
+index 0000000..6dfbbe8
--- /dev/null
+++ b/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp
-@@ -0,0 +1,298 @@
+@@ -0,0 +1,235 @@
+//===-- SIMCCodeEmitter.cpp - SI Code Emitter -------------------------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -12793,38 +13253,16 @@ index 0000000..c47dc99
+#include "llvm/MC/MCFixup.h"
+#include "llvm/Support/raw_ostream.h"
+
-+#define VGPR_BIT(src_idx) (1ULL << (9 * src_idx - 1))
-+#define SI_INSTR_FLAGS_ENCODING_MASK 0xf
-+
-+// These must be kept in sync with SIInstructions.td and also the
-+// InstrEncodingInfo array in SIInstrInfo.cpp.
-+//
-+// NOTE: This enum is only used to identify the encoding type within LLVM,
-+// the actual encoding type that is part of the instruction format is different
-+namespace SIInstrEncodingType {
-+ enum Encoding {
-+ EXP = 0,
-+ LDS = 1,
-+ MIMG = 2,
-+ MTBUF = 3,
-+ MUBUF = 4,
-+ SMRD = 5,
-+ SOP1 = 6,
-+ SOP2 = 7,
-+ SOPC = 8,
-+ SOPK = 9,
-+ SOPP = 10,
-+ VINTRP = 11,
-+ VOP1 = 12,
-+ VOP2 = 13,
-+ VOP3 = 14,
-+ VOPC = 15
-+ };
-+}
-+
+using namespace llvm;
+
+namespace {
++
++/// \brief Helper type used in encoding
++typedef union {
++ int32_t I;
++ float F;
++} IntFloatUnion;
++
+class SIMCCodeEmitter : public AMDGPUMCCodeEmitter {
+ SIMCCodeEmitter(const SIMCCodeEmitter &); // DO NOT IMPLEMENT
+ void operator=(const SIMCCodeEmitter &); // DO NOT IMPLEMENT
@@ -12833,6 +13271,15 @@ index 0000000..c47dc99
+ const MCSubtargetInfo &STI;
+ MCContext &Ctx;
+
++ /// \brief Encode a sequence of registers with the correct alignment.
++ unsigned GPRAlign(const MCInst &MI, unsigned OpNo, unsigned shift) const;
++
++ /// \brief Can this operand also contain immediate values?
++ bool isSrcOperand(const MCInstrDesc &Desc, unsigned OpNo) const;
++
++ /// \brief Encode an fp or int literal
++ uint32_t getLitEncoding(const MCOperand &MO) const;
++
+public:
+ SIMCCodeEmitter(const MCInstrInfo &mcii, const MCRegisterInfo &mri,
+ const MCSubtargetInfo &sti, MCContext &ctx)
@@ -12848,11 +13295,6 @@ index 0000000..c47dc99
+ virtual uint64_t getMachineOpValue(const MCInst &MI, const MCOperand &MO,
+ SmallVectorImpl<MCFixup> &Fixups) const;
+
-+public:
-+
-+ /// \brief Encode a sequence of registers with the correct alignment.
-+ unsigned GPRAlign(const MCInst &MI, unsigned OpNo, unsigned shift) const;
-+
+ /// \brief Encoding for when 2 consecutive registers are used
+ virtual unsigned GPR2AlignEncode(const MCInst &MI, unsigned OpNo,
+ SmallVectorImpl<MCFixup> &Fixup) const;
@@ -12860,73 +13302,142 @@ index 0000000..c47dc99
+ /// \brief Encoding for when 4 consectuive registers are used
+ virtual unsigned GPR4AlignEncode(const MCInst &MI, unsigned OpNo,
+ SmallVectorImpl<MCFixup> &Fixup) const;
++};
+
-+ /// \brief Encoding for SMRD indexed loads
-+ virtual uint32_t SMRDmemriEncode(const MCInst &MI, unsigned OpNo,
-+ SmallVectorImpl<MCFixup> &Fixup) const;
++} // End anonymous namespace
++
++MCCodeEmitter *llvm::createSIMCCodeEmitter(const MCInstrInfo &MCII,
++ const MCRegisterInfo &MRI,
++ const MCSubtargetInfo &STI,
++ MCContext &Ctx) {
++ return new SIMCCodeEmitter(MCII, MRI, STI, Ctx);
++}
+
-+ /// \brief Post-Encoder method for VOP instructions
-+ virtual uint64_t VOPPostEncode(const MCInst &MI, uint64_t Value) const;
++bool SIMCCodeEmitter::isSrcOperand(const MCInstrDesc &Desc,
++ unsigned OpNo) const {
+
-+private:
++ unsigned RegClass = Desc.OpInfo[OpNo].RegClass;
++ return (AMDGPU::SSrc_32RegClassID == RegClass) ||
++ (AMDGPU::SSrc_64RegClassID == RegClass) ||
++ (AMDGPU::VSrc_32RegClassID == RegClass) ||
++ (AMDGPU::VSrc_64RegClassID == RegClass);
++}
+
-+ /// \returns this SIInstrEncodingType for this instruction.
-+ unsigned getEncodingType(const MCInst &MI) const;
++uint32_t SIMCCodeEmitter::getLitEncoding(const MCOperand &MO) const {
+
-+ /// \brief Get then size in bytes of this instructions encoding.
-+ unsigned getEncodingBytes(const MCInst &MI) const;
++ IntFloatUnion Imm;
++ if (MO.isImm())
++ Imm.I = MO.getImm();
++ else if (MO.isFPImm())
++ Imm.F = MO.getFPImm();
++ else
++ return ~0;
+
-+ /// \returns the hardware encoding for a register
-+ unsigned getRegBinaryCode(unsigned reg) const;
++ if (Imm.I >= 0 && Imm.I <= 64)
++ return 128 + Imm.I;
+
-+ /// \brief Generated function that returns the hardware encoding for
-+ /// a register
-+ unsigned getHWRegNum(unsigned reg) const;
++ if (Imm.I >= -16 && Imm.I <= -1)
++ return 192 + abs(Imm.I);
+
-+};
++ if (Imm.F == 0.5f)
++ return 240;
+
-+} // End anonymous namespace
++ if (Imm.F == -0.5f)
++ return 241;
+
-+MCCodeEmitter *llvm::createSIMCCodeEmitter(const MCInstrInfo &MCII,
-+ const MCRegisterInfo &MRI,
-+ const MCSubtargetInfo &STI,
-+ MCContext &Ctx) {
-+ return new SIMCCodeEmitter(MCII, MRI, STI, Ctx);
++ if (Imm.F == 1.0f)
++ return 242;
++
++ if (Imm.F == -1.0f)
++ return 243;
++
++ if (Imm.F == 2.0f)
++ return 244;
++
++ if (Imm.F == -2.0f)
++ return 245;
++
++ if (Imm.F == 4.0f)
++ return 246;
++
++ if (Imm.F == 4.0f)
++ return 247;
++
++ return 255;
+}
+
+void SIMCCodeEmitter::EncodeInstruction(const MCInst &MI, raw_ostream &OS,
+ SmallVectorImpl<MCFixup> &Fixups) const {
++
+ uint64_t Encoding = getBinaryCodeForInstr(MI, Fixups);
-+ unsigned bytes = getEncodingBytes(MI);
++ const MCInstrDesc &Desc = MCII.get(MI.getOpcode());
++ unsigned bytes = Desc.getSize();
++
+ for (unsigned i = 0; i < bytes; i++) {
+ OS.write((uint8_t) ((Encoding >> (8 * i)) & 0xff));
+ }
++
++ if (bytes > 4)
++ return;
++
++ // Check for additional literals in SRC0/1/2 (Op 1/2/3)
++ for (unsigned i = 0, e = MI.getNumOperands(); i < e; ++i) {
++
++ // Check if this operand should be encoded as [SV]Src
++ if (!isSrcOperand(Desc, i))
++ continue;
++
++ // Is this operand a literal immediate?
++ const MCOperand &Op = MI.getOperand(i);
++ if (getLitEncoding(Op) != 255)
++ continue;
++
++ // Yes! Encode it
++ IntFloatUnion Imm;
++ if (Op.isImm())
++ Imm.I = Op.getImm();
++ else
++ Imm.F = Op.getFPImm();
++
++ for (unsigned j = 0; j < 4; j++) {
++ OS.write((uint8_t) ((Imm.I >> (8 * j)) & 0xff));
++ }
++
++ // Only one literal value allowed
++ break;
++ }
+}
+
+uint64_t SIMCCodeEmitter::getMachineOpValue(const MCInst &MI,
+ const MCOperand &MO,
+ SmallVectorImpl<MCFixup> &Fixups) const {
-+ if (MO.isReg()) {
-+ return getRegBinaryCode(MO.getReg());
-+ } else if (MO.isImm()) {
-+ return MO.getImm();
-+ } else if (MO.isFPImm()) {
-+ // XXX: Not all instructions can use inline literals
-+ // XXX: We should make sure this is a 32-bit constant
-+ union {
-+ float F;
-+ uint32_t I;
-+ } Imm;
-+ Imm.F = MO.getFPImm();
-+ return Imm.I;
-+ } else if (MO.isExpr()) {
++ if (MO.isReg())
++ return MRI.getEncodingValue(MO.getReg());
++
++ if (MO.isExpr()) {
+ const MCExpr *Expr = MO.getExpr();
+ MCFixupKind Kind = MCFixupKind(FK_PCRel_4);
+ Fixups.push_back(MCFixup::Create(0, Expr, Kind, MI.getLoc()));
+ return 0;
-+ } else{
-+ llvm_unreachable("Encoding of this operand type is not supported yet.");
+ }
++
++ // Figure out the operand number, needed for isSrcOperand check
++ unsigned OpNo = 0;
++ for (unsigned e = MI.getNumOperands(); OpNo < e; ++OpNo) {
++ if (&MO == &MI.getOperand(OpNo))
++ break;
++ }
++
++ const MCInstrDesc &Desc = MCII.get(MI.getOpcode());
++ if (isSrcOperand(Desc, OpNo)) {
++ uint32_t Enc = getLitEncoding(MO);
++ if (Enc != ~0U && (Enc != 255 || Desc.getSize() == 4))
++ return Enc;
++
++ } else if (MO.isImm())
++ return MO.getImm();
++
++ llvm_unreachable("Encoding of this operand type is not supported yet.");
+ return 0;
+}
+
@@ -12936,10 +13447,10 @@ index 0000000..c47dc99
+
+unsigned SIMCCodeEmitter::GPRAlign(const MCInst &MI, unsigned OpNo,
+ unsigned shift) const {
-+ unsigned regCode = getRegBinaryCode(MI.getOperand(OpNo).getReg());
-+ return regCode >> shift;
-+ return 0;
++ unsigned regCode = MRI.getEncodingValue(MI.getOperand(OpNo).getReg());
++ return (regCode & 0xff) >> shift;
+}
++
+unsigned SIMCCodeEmitter::GPR2AlignEncode(const MCInst &MI,
+ unsigned OpNo ,
+ SmallVectorImpl<MCFixup> &Fixup) const {
@@ -12951,120 +13462,6 @@ index 0000000..c47dc99
+ SmallVectorImpl<MCFixup> &Fixup) const {
+ return GPRAlign(MI, OpNo, 2);
+}
-+
-+#define SMRD_OFFSET_MASK 0xff
-+#define SMRD_IMM_SHIFT 8
-+#define SMRD_SBASE_MASK 0x3f
-+#define SMRD_SBASE_SHIFT 9
-+/// This function is responsibe for encoding the offset
-+/// and the base ptr for SMRD instructions it should return a bit string in
-+/// this format:
-+///
-+/// OFFSET = bits{7-0}
-+/// IMM = bits{8}
-+/// SBASE = bits{14-9}
-+///
-+uint32_t SIMCCodeEmitter::SMRDmemriEncode(const MCInst &MI, unsigned OpNo,
-+ SmallVectorImpl<MCFixup> &Fixup) const {
-+ uint32_t Encoding;
-+
-+ const MCOperand &OffsetOp = MI.getOperand(OpNo + 1);
-+
-+ //XXX: Use this function for SMRD loads with register offsets
-+ assert(OffsetOp.isImm());
-+
-+ Encoding =
-+ (getMachineOpValue(MI, OffsetOp, Fixup) & SMRD_OFFSET_MASK)
-+ | (1 << SMRD_IMM_SHIFT) //XXX If the Offset is a register we shouldn't set this bit
-+ | ((GPR2AlignEncode(MI, OpNo, Fixup) & SMRD_SBASE_MASK) << SMRD_SBASE_SHIFT)
-+ ;
-+
-+ return Encoding;
-+}
-+
-+//===----------------------------------------------------------------------===//
-+// Post Encoder Callbacks
-+//===----------------------------------------------------------------------===//
-+
-+uint64_t SIMCCodeEmitter::VOPPostEncode(const MCInst &MI, uint64_t Value) const{
-+ unsigned encodingType = getEncodingType(MI);
-+ unsigned numSrcOps;
-+ unsigned vgprBitOffset;
-+
-+ if (encodingType == SIInstrEncodingType::VOP3) {
-+ numSrcOps = 3;
-+ vgprBitOffset = 32;
-+ } else {
-+ numSrcOps = 1;
-+ vgprBitOffset = 0;
-+ }
-+
-+ // Add one to skip over the destination reg operand.
-+ for (unsigned opIdx = 1; opIdx < numSrcOps + 1; opIdx++) {
-+ const MCOperand &MO = MI.getOperand(opIdx);
-+ if (MO.isReg()) {
-+ unsigned reg = MI.getOperand(opIdx).getReg();
-+ if (AMDGPUMCRegisterClasses[AMDGPU::VReg_32RegClassID].contains(reg) ||
-+ AMDGPUMCRegisterClasses[AMDGPU::VReg_64RegClassID].contains(reg)) {
-+ Value |= (VGPR_BIT(opIdx)) << vgprBitOffset;
-+ }
-+ } else if (MO.isFPImm()) {
-+ union {
-+ float f;
-+ uint32_t i;
-+ } Imm;
-+ // XXX: Not all instructions can use inline literals
-+ // XXX: We should make sure this is a 32-bit constant
-+ Imm.f = MO.getFPImm();
-+ Value |= ((uint64_t)Imm.i) << 32;
-+ }
-+ }
-+ return Value;
-+}
-+
-+//===----------------------------------------------------------------------===//
-+// Encoding helper functions
-+//===----------------------------------------------------------------------===//
-+
-+unsigned SIMCCodeEmitter::getEncodingType(const MCInst &MI) const {
-+ return MCII.get(MI.getOpcode()).TSFlags & SI_INSTR_FLAGS_ENCODING_MASK;
-+}
-+
-+unsigned SIMCCodeEmitter::getEncodingBytes(const MCInst &MI) const {
-+
-+ // These instructions aren't real instructions with an encoding type, so
-+ // we need to manually specify their size.
-+ switch (MI.getOpcode()) {
-+ default: break;
-+ case AMDGPU::SI_LOAD_LITERAL_I32:
-+ case AMDGPU::SI_LOAD_LITERAL_F32:
-+ return 4;
-+ }
-+
-+ unsigned encoding_type = getEncodingType(MI);
-+ switch (encoding_type) {
-+ case SIInstrEncodingType::EXP:
-+ case SIInstrEncodingType::LDS:
-+ case SIInstrEncodingType::MUBUF:
-+ case SIInstrEncodingType::MTBUF:
-+ case SIInstrEncodingType::MIMG:
-+ case SIInstrEncodingType::VOP3:
-+ return 8;
-+ default:
-+ return 4;
-+ }
-+}
-+
-+
-+unsigned SIMCCodeEmitter::getRegBinaryCode(unsigned reg) const {
-+ switch (reg) {
-+ case AMDGPU::M0: return 124;
-+ case AMDGPU::SREG_LIT_0: return 128;
-+ case AMDGPU::SI_LITERAL_CONSTANT: return 255;
-+ default: return MRI.getEncodingValue(reg);
-+ }
-+}
-+
diff --git a/lib/Target/R600/Makefile b/lib/Target/R600/Makefile
new file mode 100644
index 0000000..1b3ebbe
@@ -13096,10 +13493,10 @@ index 0000000..1b3ebbe
+include $(LEVEL)/Makefile.common
diff --git a/lib/Target/R600/Processors.td b/lib/Target/R600/Processors.td
new file mode 100644
-index 0000000..3dc1ecd
+index 0000000..868810c
--- /dev/null
+++ b/lib/Target/R600/Processors.td
-@@ -0,0 +1,29 @@
+@@ -0,0 +1,30 @@
+//===-- Processors.td - TODO: Add brief description -------===//
+//
+// The LLVM Compiler Infrastructure
@@ -13115,6 +13512,7 @@ index 0000000..3dc1ecd
+
+class Proc<string Name, ProcessorItineraries itin, list<SubtargetFeature> Features>
+: Processor<Name, itin, Features>;
++def : Proc<"", R600_EG_Itin, [FeatureR600ALUInst]>;
+def : Proc<"r600", R600_EG_Itin, [FeatureR600ALUInst]>;
+def : Proc<"rv710", R600_EG_Itin, []>;
+def : Proc<"rv730", R600_EG_Itin, []>;
@@ -13131,10 +13529,10 @@ index 0000000..3dc1ecd
+
diff --git a/lib/Target/R600/R600Defines.h b/lib/Target/R600/R600Defines.h
new file mode 100644
-index 0000000..7dea8e4
+index 0000000..16cfcf5
--- /dev/null
+++ b/lib/Target/R600/R600Defines.h
-@@ -0,0 +1,79 @@
+@@ -0,0 +1,97 @@
+//===-- R600Defines.h - R600 Helper Macros ----------------------*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -13186,6 +13584,9 @@ index 0000000..7dea8e4
+#define HW_REG_MASK 0x1ff
+#define HW_CHAN_SHIFT 9
+
++#define GET_REG_CHAN(reg) ((reg) >> HW_CHAN_SHIFT)
++#define GET_REG_INDEX(reg) ((reg) & HW_REG_MASK)
++
+namespace R600Operands {
+ enum Ops {
+ DST,
@@ -13199,27 +13600,42 @@ index 0000000..7dea8e4
+ SRC0_NEG,
+ SRC0_REL,
+ SRC0_ABS,
++ SRC0_SEL,
+ SRC1,
+ SRC1_NEG,
+ SRC1_REL,
+ SRC1_ABS,
++ SRC1_SEL,
+ SRC2,
+ SRC2_NEG,
+ SRC2_REL,
++ SRC2_SEL,
+ LAST,
+ PRED_SEL,
+ IMM,
+ COUNT
+ };
++
++ const static int ALUOpTable[3][R600Operands::COUNT] = {
++// W C S S S S S S S S S S S
++// R O D L S R R R R S R R R R S R R R L P
++// D U I M R A R C C C C R C C C C R C C C A R I
++// S E U T O E M C 0 0 0 0 C 1 1 1 1 C 2 2 2 S E M
++// T M P E D L P 0 N R A S 1 N R A S 2 N R S T D M
++ {0,-1,-1, 1, 2, 3, 4, 5, 6, 7, 8, 9,-1,-1,-1,-1,-1,-1,-1,-1,-1,10,11,12},
++ {0, 1, 2, 3, 4 ,5 ,6 ,7, 8, 9,10,11,12,13,14,15,16,-1,-1,-1,-1,17,18,19},
++ {0,-1,-1,-1,-1, 1, 2, 3, 4, 5,-1, 6, 7, 8, 9,-1,10,11,12,13,14,15,16,17}
++ };
++
+}
+
+#endif // R600DEFINES_H_
diff --git a/lib/Target/R600/R600ExpandSpecialInstrs.cpp b/lib/Target/R600/R600ExpandSpecialInstrs.cpp
new file mode 100644
-index 0000000..b6e62b7
+index 0000000..c00c349
--- /dev/null
+++ b/lib/Target/R600/R600ExpandSpecialInstrs.cpp
-@@ -0,0 +1,334 @@
+@@ -0,0 +1,290 @@
+//===-- R600ExpandSpecialInstrs.cpp - Expand special instructions ---------===//
+//
+// The LLVM Compiler Infrastructure
@@ -13277,118 +13693,6 @@ index 0000000..b6e62b7
+ return new R600ExpandSpecialInstrsPass(TM);
+}
+
-+bool R600ExpandSpecialInstrsPass::ExpandInputPerspective(MachineInstr &MI) {
-+ const R600RegisterInfo &TRI = TII->getRegisterInfo();
-+ if (MI.getOpcode() != AMDGPU::input_perspective)
-+ return false;
-+
-+ MachineBasicBlock::iterator I = &MI;
-+ unsigned DstReg = MI.getOperand(0).getReg();
-+ R600MachineFunctionInfo *MFI = MI.getParent()->getParent()
-+ ->getInfo<R600MachineFunctionInfo>();
-+ unsigned IJIndexBase;
-+
-+ // In Evergreen ISA doc section 8.3.2 :
-+ // We need to interpolate XY and ZW in two different instruction groups.
-+ // An INTERP_* must occupy all 4 slots of an instruction group.
-+ // Output of INTERP_XY is written in X,Y slots
-+ // Output of INTERP_ZW is written in Z,W slots
-+ //
-+ // Thus interpolation requires the following sequences :
-+ //
-+ // AnyGPR.x = INTERP_ZW; (Write Masked Out)
-+ // AnyGPR.y = INTERP_ZW; (Write Masked Out)
-+ // DstGPR.z = INTERP_ZW;
-+ // DstGPR.w = INTERP_ZW; (End of first IG)
-+ // DstGPR.x = INTERP_XY;
-+ // DstGPR.y = INTERP_XY;
-+ // AnyGPR.z = INTERP_XY; (Write Masked Out)
-+ // AnyGPR.w = INTERP_XY; (Write Masked Out) (End of second IG)
-+ //
-+ switch (MI.getOperand(1).getImm()) {
-+ case 0:
-+ IJIndexBase = MFI->GetIJPerspectiveIndex();
-+ break;
-+ case 1:
-+ IJIndexBase = MFI->GetIJLinearIndex();
-+ break;
-+ default:
-+ assert(0 && "Unknow ij index");
-+ }
-+
-+ for (unsigned i = 0; i < 8; i++) {
-+ unsigned IJIndex = AMDGPU::R600_TReg32RegClass.getRegister(
-+ 2 * IJIndexBase + ((i + 1) % 2));
-+ unsigned ReadReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
-+ MI.getOperand(2).getImm());
-+
-+
-+ unsigned Sel = AMDGPU::sel_x;
-+ switch (i % 4) {
-+ case 0:Sel = AMDGPU::sel_x;break;
-+ case 1:Sel = AMDGPU::sel_y;break;
-+ case 2:Sel = AMDGPU::sel_z;break;
-+ case 3:Sel = AMDGPU::sel_w;break;
-+ default:break;
-+ }
-+
-+ unsigned Res = TRI.getSubReg(DstReg, Sel);
-+
-+ unsigned Opcode = (i < 4)?AMDGPU::INTERP_ZW:AMDGPU::INTERP_XY;
-+
-+ MachineBasicBlock &MBB = *(MI.getParent());
-+ MachineInstr *NewMI =
-+ TII->buildDefaultInstruction(MBB, I, Opcode, Res, IJIndex, ReadReg);
-+
-+ if (!(i> 1 && i < 6)) {
-+ TII->addFlag(NewMI, 0, MO_FLAG_MASK);
-+ }
-+
-+ if (i % 4 != 3)
-+ TII->addFlag(NewMI, 0, MO_FLAG_NOT_LAST);
-+ }
-+
-+ MI.eraseFromParent();
-+
-+ return true;
-+}
-+
-+bool R600ExpandSpecialInstrsPass::ExpandInputConstant(MachineInstr &MI) {
-+ const R600RegisterInfo &TRI = TII->getRegisterInfo();
-+ if (MI.getOpcode() != AMDGPU::input_constant)
-+ return false;
-+
-+ MachineBasicBlock::iterator I = &MI;
-+ unsigned DstReg = MI.getOperand(0).getReg();
-+
-+ for (unsigned i = 0; i < 4; i++) {
-+ unsigned ReadReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
-+ MI.getOperand(1).getImm());
-+
-+ unsigned Sel = AMDGPU::sel_x;
-+ switch (i % 4) {
-+ case 0:Sel = AMDGPU::sel_x;break;
-+ case 1:Sel = AMDGPU::sel_y;break;
-+ case 2:Sel = AMDGPU::sel_z;break;
-+ case 3:Sel = AMDGPU::sel_w;break;
-+ default:break;
-+ }
-+
-+ unsigned Res = TRI.getSubReg(DstReg, Sel);
-+
-+ MachineBasicBlock &MBB = *(MI.getParent());
-+ MachineInstr *NewMI = TII->buildDefaultInstruction(
-+ MBB, I, AMDGPU::INTERP_LOAD_P0, Res, ReadReg);
-+
-+ if (i % 4 != 3)
-+ TII->addFlag(NewMI, 0, MO_FLAG_NOT_LAST);
-+ }
-+
-+ MI.eraseFromParent();
-+
-+ return true;
-+}
-+
+bool R600ExpandSpecialInstrsPass::runOnMachineFunction(MachineFunction &MF) {
+
+ const R600RegisterInfo &TRI = TII->getRegisterInfo();
@@ -13422,7 +13726,7 @@ index 0000000..b6e62b7
+ MI.eraseFromParent();
+ continue;
+ }
-+ case AMDGPU::BREAK:
++ case AMDGPU::BREAK: {
+ MachineInstr *PredSet = TII->buildDefaultInstruction(MBB, I,
+ AMDGPU::PRED_SETE_INT,
+ AMDGPU::PREDICATE_BIT,
@@ -13436,12 +13740,81 @@ index 0000000..b6e62b7
+ .addReg(AMDGPU::PREDICATE_BIT);
+ MI.eraseFromParent();
+ continue;
-+ }
++ }
+
-+ if (ExpandInputPerspective(MI))
-+ continue;
-+ if (ExpandInputConstant(MI))
-+ continue;
++ case AMDGPU::INTERP_PAIR_XY: {
++ MachineInstr *BMI;
++ unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
++ MI.getOperand(2).getImm());
++
++ for (unsigned Chan = 0; Chan < 4; ++Chan) {
++ unsigned DstReg;
++
++ if (Chan < 2)
++ DstReg = MI.getOperand(Chan).getReg();
++ else
++ DstReg = Chan == 2 ? AMDGPU::T0_Z : AMDGPU::T0_W;
++
++ BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_XY,
++ DstReg, MI.getOperand(3 + (Chan % 2)).getReg(), PReg);
++
++ BMI->setIsInsideBundle(Chan > 0);
++ if (Chan >= 2)
++ TII->addFlag(BMI, 0, MO_FLAG_MASK);
++ if (Chan != 3)
++ TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST);
++ }
++
++ MI.eraseFromParent();
++ continue;
++ }
++
++ case AMDGPU::INTERP_PAIR_ZW: {
++ MachineInstr *BMI;
++ unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
++ MI.getOperand(2).getImm());
++
++ for (unsigned Chan = 0; Chan < 4; ++Chan) {
++ unsigned DstReg;
++
++ if (Chan < 2)
++ DstReg = Chan == 0 ? AMDGPU::T0_X : AMDGPU::T0_Y;
++ else
++ DstReg = MI.getOperand(Chan-2).getReg();
++
++ BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_ZW,
++ DstReg, MI.getOperand(3 + (Chan % 2)).getReg(), PReg);
++
++ BMI->setIsInsideBundle(Chan > 0);
++ if (Chan < 2)
++ TII->addFlag(BMI, 0, MO_FLAG_MASK);
++ if (Chan != 3)
++ TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST);
++ }
++
++ MI.eraseFromParent();
++ continue;
++ }
++
++ case AMDGPU::INTERP_VEC_LOAD: {
++ const R600RegisterInfo &TRI = TII->getRegisterInfo();
++ MachineInstr *BMI;
++ unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
++ MI.getOperand(1).getImm());
++ unsigned DstReg = MI.getOperand(0).getReg();
++
++ for (unsigned Chan = 0; Chan < 4; ++Chan) {
++ BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_LOAD_P0,
++ TRI.getSubReg(DstReg, TRI.getSubRegFromChannel(Chan)), PReg);
++ BMI->setIsInsideBundle(Chan > 0);
++ if (Chan != 3)
++ TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST);
++ }
++
++ MI.eraseFromParent();
++ continue;
++ }
++ }
+
+ bool IsReduction = TII->isReductionOp(MI.getOpcode());
+ bool IsVector = TII->isVector(MI);
@@ -13540,8 +13913,7 @@ index 0000000..b6e62b7
+ MachineInstr *NewMI =
+ TII->buildDefaultInstruction(MBB, I, Opcode, DstReg, Src0, Src1);
+
-+ if (Chan != 0)
-+ NewMI->bundleWithPred();
++ NewMI->setIsInsideBundle(Chan != 0);
+ if (Mask) {
+ TII->addFlag(NewMI, 0, MO_FLAG_MASK);
+ }
@@ -13556,10 +13928,10 @@ index 0000000..b6e62b7
+}
diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp
new file mode 100644
-index 0000000..d6b9d90
+index 0000000..9c38522
--- /dev/null
+++ b/lib/Target/R600/R600ISelLowering.cpp
-@@ -0,0 +1,909 @@
+@@ -0,0 +1,1195 @@
+//===-- R600ISelLowering.cpp - R600 DAG Lowering Implementation -----------===//
+//
+// The LLVM Compiler Infrastructure
@@ -13580,6 +13952,7 @@ index 0000000..d6b9d90
+#include "R600MachineFunctionInfo.h"
+#include "llvm/Argument.h"
+#include "llvm/Function.h"
++#include "llvm/CodeGen/MachineFrameInfo.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/SelectionDAG.h"
@@ -13633,10 +14006,27 @@ index 0000000..d6b9d90
+ setOperationAction(ISD::SELECT, MVT::i32, Custom);
+ setOperationAction(ISD::SELECT, MVT::f32, Custom);
+
++ // Legalize loads and stores to the private address space.
++ setOperationAction(ISD::LOAD, MVT::i32, Custom);
++ setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
++ setOperationAction(ISD::LOAD, MVT::v4i32, Custom);
++ setLoadExtAction(ISD::EXTLOAD, MVT::v4i8, Custom);
++ setLoadExtAction(ISD::EXTLOAD, MVT::i8, Custom);
++ setLoadExtAction(ISD::ZEXTLOAD, MVT::i8, Custom);
++ setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i8, Custom);
++ setOperationAction(ISD::STORE, MVT::i8, Custom);
+ setOperationAction(ISD::STORE, MVT::i32, Custom);
++ setOperationAction(ISD::STORE, MVT::v2i32, Custom);
+ setOperationAction(ISD::STORE, MVT::v4i32, Custom);
+
++ setOperationAction(ISD::LOAD, MVT::i32, Custom);
++ setOperationAction(ISD::LOAD, MVT::v4i32, Custom);
++ setOperationAction(ISD::FrameIndex, MVT::i32, Custom);
++
+ setTargetDAGCombine(ISD::FP_ROUND);
++ setTargetDAGCombine(ISD::FP_TO_SINT);
++ setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);
++ setTargetDAGCombine(ISD::SELECT_CC);
+
+ setSchedulingPreference(Sched::VLIW);
+}
@@ -13677,15 +14067,6 @@ index 0000000..d6b9d90
+ break;
+ }
+
-+ case AMDGPU::R600_LOAD_CONST: {
-+ int64_t RegIndex = MI->getOperand(1).getImm();
-+ unsigned ConstantReg = AMDGPU::R600_CReg32RegClass.getRegister(RegIndex);
-+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::COPY))
-+ .addOperand(MI->getOperand(0))
-+ .addReg(ConstantReg);
-+ break;
-+ }
-+
+ case AMDGPU::MASK_WRITE: {
+ unsigned maskedRegister = MI->getOperand(0).getReg();
+ assert(TargetRegisterInfo::isVirtualRegister(maskedRegister));
@@ -13716,18 +14097,6 @@ index 0000000..d6b9d90
+ break;
+ }
+
-+ case AMDGPU::RESERVE_REG: {
-+ R600MachineFunctionInfo * MFI = MF->getInfo<R600MachineFunctionInfo>();
-+ int64_t ReservedIndex = MI->getOperand(0).getImm();
-+ unsigned ReservedReg =
-+ AMDGPU::R600_TReg32RegClass.getRegister(ReservedIndex);
-+ MFI->ReservedRegs.push_back(ReservedReg);
-+ unsigned SuperReg =
-+ AMDGPU::R600_Reg128RegClass.getRegister(ReservedIndex / 4);
-+ MFI->ReservedRegs.push_back(SuperReg);
-+ break;
-+ }
-+
+ case AMDGPU::TXD: {
+ unsigned T0 = MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass);
+ unsigned T1 = MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass);
@@ -13812,33 +14181,26 @@ index 0000000..d6b9d90
+ break;
+ }
+
-+ case AMDGPU::input_perspective: {
-+ R600MachineFunctionInfo *MFI = MF->getInfo<R600MachineFunctionInfo>();
-+
-+ // XXX Be more fine about register reservation
-+ for (unsigned i = 0; i < 4; i ++) {
-+ unsigned ReservedReg = AMDGPU::R600_TReg32RegClass.getRegister(i);
-+ MFI->ReservedRegs.push_back(ReservedReg);
-+ }
-+
-+ switch (MI->getOperand(1).getImm()) {
-+ case 0:// Perspective
-+ MFI->HasPerspectiveInterpolation = true;
-+ break;
-+ case 1:// Linear
-+ MFI->HasLinearInterpolation = true;
-+ break;
-+ default:
-+ assert(0 && "Unknow ij index");
-+ }
-+
-+ return BB;
-+ }
-+
+ case AMDGPU::EG_ExportSwz:
+ case AMDGPU::R600_ExportSwz: {
++ // Instruction is left unmodified if its not the last one of its type
++ bool isLastInstructionOfItsType = true;
++ unsigned InstExportType = MI->getOperand(1).getImm();
++ for (MachineBasicBlock::iterator NextExportInst = llvm::next(I),
++ EndBlock = BB->end(); NextExportInst != EndBlock;
++ NextExportInst = llvm::next(NextExportInst)) {
++ if (NextExportInst->getOpcode() == AMDGPU::EG_ExportSwz ||
++ NextExportInst->getOpcode() == AMDGPU::R600_ExportSwz) {
++ unsigned CurrentInstExportType = NextExportInst->getOperand(1)
++ .getImm();
++ if (CurrentInstExportType == InstExportType) {
++ isLastInstructionOfItsType = false;
++ break;
++ }
++ }
++ }
+ bool EOP = (llvm::next(I)->getOpcode() == AMDGPU::RETURN)? 1 : 0;
-+ if (!EOP)
++ if (!EOP && !isLastInstructionOfItsType)
+ return BB;
+ unsigned CfInst = (MI->getOpcode() == AMDGPU::EG_ExportSwz)? 84 : 40;
+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI->getOpcode()))
@@ -13850,7 +14212,7 @@ index 0000000..d6b9d90
+ .addOperand(MI->getOperand(5))
+ .addOperand(MI->getOperand(6))
+ .addImm(CfInst)
-+ .addImm(1);
++ .addImm(EOP);
+ break;
+ }
+ }
@@ -13926,7 +14288,9 @@ index 0000000..d6b9d90
+ case ISD::SELECT: return LowerSELECT(Op, DAG);
+ case ISD::SETCC: return LowerSETCC(Op, DAG);
+ case ISD::STORE: return LowerSTORE(Op, DAG);
++ case ISD::LOAD: return LowerLOAD(Op, DAG);
+ case ISD::FPOW: return LowerFPOW(Op, DAG);
++ case ISD::FrameIndex: return LowerFrameIndex(Op, DAG);
+ case ISD::INTRINSIC_VOID: {
+ SDValue Chain = Op.getOperand(0);
+ unsigned IntrinsicID =
@@ -13953,39 +14317,7 @@ index 0000000..d6b9d90
+ Chain);
+
+ }
-+ case AMDGPUIntrinsic::R600_store_stream_output : {
-+ MachineFunction &MF = DAG.getMachineFunction();
-+ R600MachineFunctionInfo *MFI = MF.getInfo<R600MachineFunctionInfo>();
-+ int64_t RegIndex = cast<ConstantSDNode>(Op.getOperand(3))->getZExtValue();
-+ int64_t BufIndex = cast<ConstantSDNode>(Op.getOperand(4))->getZExtValue();
-+
-+ SDNode **OutputsMap = MFI->StreamOutputs[BufIndex];
-+ unsigned Inst;
-+ switch (cast<ConstantSDNode>(Op.getOperand(4))->getZExtValue() ) {
-+ // STREAM3
-+ case 3:
-+ Inst = 4;
-+ break;
-+ // STREAM2
-+ case 2:
-+ Inst = 3;
-+ break;
-+ // STREAM1
-+ case 1:
-+ Inst = 2;
-+ break;
-+ // STREAM0
-+ case 0:
-+ Inst = 1;
-+ break;
-+ default:
-+ assert(0 && "Wrong buffer id for stream outputs !");
-+ }
+
-+ return InsertScalarToRegisterExport(DAG, Op.getDebugLoc(), OutputsMap,
-+ RegIndex / 4, RegIndex % 4, Inst, 0, Op.getOperand(2),
-+ Chain);
-+ }
+ // default for switch(IntrinsicID)
+ default: break;
+ }
@@ -14004,38 +14336,35 @@ index 0000000..d6b9d90
+ unsigned Reg = AMDGPU::R600_TReg32RegClass.getRegister(RegIndex);
+ return CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass, Reg, VT);
+ }
-+ case AMDGPUIntrinsic::R600_load_input_perspective: {
-+ int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
-+ if (slot < 0)
-+ return DAG.getUNDEF(MVT::f32);
-+ SDValue FullVector = DAG.getNode(
-+ AMDGPUISD::INTERP,
-+ DL, MVT::v4f32,
-+ DAG.getConstant(0, MVT::i32), DAG.getConstant(slot / 4 , MVT::i32));
-+ return DAG.getNode(ISD::EXTRACT_VECTOR_ELT,
-+ DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32));
-+ }
-+ case AMDGPUIntrinsic::R600_load_input_linear: {
-+ int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
-+ if (slot < 0)
-+ return DAG.getUNDEF(MVT::f32);
-+ SDValue FullVector = DAG.getNode(
-+ AMDGPUISD::INTERP,
-+ DL, MVT::v4f32,
-+ DAG.getConstant(1, MVT::i32), DAG.getConstant(slot / 4 , MVT::i32));
-+ return DAG.getNode(ISD::EXTRACT_VECTOR_ELT,
-+ DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32));
-+ }
-+ case AMDGPUIntrinsic::R600_load_input_constant: {
++
++ case AMDGPUIntrinsic::R600_interp_input: {
+ int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
-+ if (slot < 0)
-+ return DAG.getUNDEF(MVT::f32);
-+ SDValue FullVector = DAG.getNode(
-+ AMDGPUISD::INTERP_P0,
-+ DL, MVT::v4f32,
-+ DAG.getConstant(slot / 4 , MVT::i32));
-+ return DAG.getNode(ISD::EXTRACT_VECTOR_ELT,
-+ DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32));
++ int ijb = cast<ConstantSDNode>(Op.getOperand(2))->getSExtValue();
++ MachineSDNode *interp;
++ if (ijb < 0) {
++ interp = DAG.getMachineNode(AMDGPU::INTERP_VEC_LOAD, DL,
++ MVT::v4f32, DAG.getTargetConstant(slot / 4 , MVT::i32));
++ return DAG.getTargetExtractSubreg(
++ TII->getRegisterInfo().getSubRegFromChannel(slot % 4),
++ DL, MVT::f32, SDValue(interp, 0));
++ }
++
++ if (slot % 4 < 2)
++ interp = DAG.getMachineNode(AMDGPU::INTERP_PAIR_XY, DL,
++ MVT::f32, MVT::f32, DAG.getTargetConstant(slot / 4 , MVT::i32),
++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb + 1), MVT::f32),
++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb), MVT::f32));
++ else
++ interp = DAG.getMachineNode(AMDGPU::INTERP_PAIR_ZW, DL,
++ MVT::f32, MVT::f32, DAG.getTargetConstant(slot / 4 , MVT::i32),
++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb + 1), MVT::f32),
++ CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
++ AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb), MVT::f32));
++
++ return SDValue(interp, slot % 2);
+ }
+
+ case r600_read_ngroups_x:
@@ -14089,6 +14418,20 @@ index 0000000..d6b9d90
+ switch (N->getOpcode()) {
+ default: return;
+ case ISD::FP_TO_UINT: Results.push_back(LowerFPTOUINT(N->getOperand(0), DAG));
++ return;
++ case ISD::LOAD: {
++ SDNode *Node = LowerLOAD(SDValue(N, 0), DAG).getNode();
++ Results.push_back(SDValue(Node, 0));
++ Results.push_back(SDValue(Node, 1));
++ // XXX: LLVM seems not to replace Chain Value inside CustomWidenLowerNode
++ // function
++ DAG.ReplaceAllUsesOfValueWith(SDValue(N,1), SDValue(Node, 1));
++ return;
++ }
++ case ISD::STORE:
++ SDNode *Node = LowerSTORE(SDValue(N, 0), DAG).getNode();
++ Results.push_back(SDValue(Node, 0));
++ return;
+ }
+}
+
@@ -14156,6 +14499,20 @@ index 0000000..d6b9d90
+ false, false, false, 0);
+}
+
++SDValue R600TargetLowering::LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const {
++
++ MachineFunction &MF = DAG.getMachineFunction();
++ const AMDGPUFrameLowering *TFL =
++ static_cast<const AMDGPUFrameLowering*>(getTargetMachine().getFrameLowering());
++
++ FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(Op);
++ assert(FIN);
++
++ unsigned FrameIndex = FIN->getIndex();
++ unsigned Offset = TFL->getFrameIndexOffset(MF, FrameIndex);
++ return DAG.getConstant(Offset * 4 * TFL->getStackWidth(MF), MVT::i32);
++}
++
+SDValue R600TargetLowering::LowerROTL(SDValue Op, SelectionDAG &DAG) const {
+ DebugLoc DL = Op.getDebugLoc();
+ EVT VT = Op.getValueType();
@@ -14242,9 +14599,12 @@ index 0000000..d6b9d90
+ }
+
+ // Try to lower to a SET* instruction:
-+ // We need all the operands of SELECT_CC to have the same value type, so if
-+ // necessary we need to change True and False to be the same type as LHS and
-+ // RHS, and then convert the result of the select_cc back to the correct type.
++ //
++ // CompareVT == MVT::f32 and VT == MVT::i32 is supported by the hardware,
++ // but for the other case where CompareVT != VT, all operands of
++ // SELECT_CC need to have the same value type, so we need to change True and
++ // False to be the same type as LHS and RHS, and then convert the result of
++ // the select_cc back to the correct type.
+
+ // Move hardware True/False values to the correct operand.
+ if (isHWTrueValue(False) && isHWFalseValue(True)) {
@@ -14254,32 +14614,17 @@ index 0000000..d6b9d90
+ }
+
+ if (isHWTrueValue(True) && isHWFalseValue(False)) {
-+ if (CompareVT != VT) {
-+ if (VT == MVT::f32 && CompareVT == MVT::i32) {
-+ SDValue Boolean = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
-+ LHS, RHS,
-+ DAG.getConstant(-1, MVT::i32),
-+ DAG.getConstant(0, MVT::i32),
-+ CC);
-+ // Convert integer values of true (-1) and false (0) to fp values of
-+ // true (1.0f) and false (0.0f).
-+ SDValue LSB = DAG.getNode(ISD::AND, DL, MVT::i32, Boolean,
-+ DAG.getConstant(1, MVT::i32));
-+ return DAG.getNode(ISD::UINT_TO_FP, DL, VT, LSB);
-+ } else if (VT == MVT::i32 && CompareVT == MVT::f32) {
-+ SDValue BoolAsFlt = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
-+ LHS, RHS,
-+ DAG.getConstantFP(1.0f, MVT::f32),
-+ DAG.getConstantFP(0.0f, MVT::f32),
-+ CC);
-+ // Convert fp values of true (1.0f) and false (0.0f) to integer values
-+ // of true (-1) and false (0).
-+ SDValue Neg = DAG.getNode(ISD::FNEG, DL, MVT::f32, BoolAsFlt);
-+ return DAG.getNode(ISD::FP_TO_SINT, DL, VT, Neg);
-+ } else {
-+ // I don't think there will be any other type pairings.
-+ assert(!"Unhandled operand type parings in SELECT_CC");
-+ }
++ if (CompareVT != VT && VT == MVT::f32 && CompareVT == MVT::i32) {
++ SDValue Boolean = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
++ LHS, RHS,
++ DAG.getConstant(-1, MVT::i32),
++ DAG.getConstant(0, MVT::i32),
++ CC);
++ // Convert integer values of true (-1) and false (0) to fp values of
++ // true (1.0f) and false (0.0f).
++ SDValue LSB = DAG.getNode(ISD::AND, DL, MVT::i32, Boolean,
++ DAG.getConstant(1, MVT::i32));
++ return DAG.getNode(ISD::UINT_TO_FP, DL, VT, LSB);
+ } else {
+ // This SELECT_CC is already legal.
+ return DAG.getNode(ISD::SELECT_CC, DL, VT, LHS, RHS, True, False, CC);
@@ -14370,6 +14715,61 @@ index 0000000..d6b9d90
+ return Cond;
+}
+
++/// LLVM generates byte-addresed pointers. For indirect addressing, we need to
++/// convert these pointers to a register index. Each register holds
++/// 16 bytes, (4 x 32bit sub-register), but we need to take into account the
++/// \p StackWidth, which tells us how many of the 4 sub-registrers will be used
++/// for indirect addressing.
++SDValue R600TargetLowering::stackPtrToRegIndex(SDValue Ptr,
++ unsigned StackWidth,
++ SelectionDAG &DAG) const {
++ unsigned SRLPad;
++ switch(StackWidth) {
++ case 1:
++ SRLPad = 2;
++ break;
++ case 2:
++ SRLPad = 3;
++ break;
++ case 4:
++ SRLPad = 4;
++ break;
++ default: llvm_unreachable("Invalid stack width");
++ }
++
++ return DAG.getNode(ISD::SRL, Ptr.getDebugLoc(), Ptr.getValueType(), Ptr,
++ DAG.getConstant(SRLPad, MVT::i32));
++}
++
++void R600TargetLowering::getStackAddress(unsigned StackWidth,
++ unsigned ElemIdx,
++ unsigned &Channel,
++ unsigned &PtrIncr) const {
++ switch (StackWidth) {
++ default:
++ case 1:
++ Channel = 0;
++ if (ElemIdx > 0) {
++ PtrIncr = 1;
++ } else {
++ PtrIncr = 0;
++ }
++ break;
++ case 2:
++ Channel = ElemIdx % 2;
++ if (ElemIdx == 2) {
++ PtrIncr = 1;
++ } else {
++ PtrIncr = 0;
++ }
++ break;
++ case 4:
++ Channel = ElemIdx;
++ PtrIncr = 0;
++ break;
++ }
++}
++
+SDValue R600TargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {
+ DebugLoc DL = Op.getDebugLoc();
+ StoreSDNode *StoreNode = cast<StoreSDNode>(Op);
@@ -14391,23 +14791,202 @@ index 0000000..d6b9d90
+ }
+ return Chain;
+ }
-+ return SDValue();
-+}
+
++ EVT ValueVT = Value.getValueType();
+
-+SDValue R600TargetLowering::LowerFPOW(SDValue Op,
-+ SelectionDAG &DAG) const {
-+ DebugLoc DL = Op.getDebugLoc();
-+ EVT VT = Op.getValueType();
-+ SDValue LogBase = DAG.getNode(ISD::FLOG2, DL, VT, Op.getOperand(0));
-+ SDValue MulLogBase = DAG.getNode(ISD::FMUL, DL, VT, Op.getOperand(1), LogBase);
-+ return DAG.getNode(ISD::FEXP2, DL, VT, MulLogBase);
++ if (StoreNode->getAddressSpace() != AMDGPUAS::PRIVATE_ADDRESS) {
++ return SDValue();
++ }
++
++ // Lowering for indirect addressing
++
++ const MachineFunction &MF = DAG.getMachineFunction();
++ const AMDGPUFrameLowering *TFL = static_cast<const AMDGPUFrameLowering*>(
++ getTargetMachine().getFrameLowering());
++ unsigned StackWidth = TFL->getStackWidth(MF);
++
++ Ptr = stackPtrToRegIndex(Ptr, StackWidth, DAG);
++
++ if (ValueVT.isVector()) {
++ unsigned NumElemVT = ValueVT.getVectorNumElements();
++ EVT ElemVT = ValueVT.getVectorElementType();
++ SDValue Stores[4];
++
++ assert(NumElemVT >= StackWidth && "Stack width cannot be greater than "
++ "vector width in load");
++
++ for (unsigned i = 0; i < NumElemVT; ++i) {
++ unsigned Channel, PtrIncr;
++ getStackAddress(StackWidth, i, Channel, PtrIncr);
++ Ptr = DAG.getNode(ISD::ADD, DL, MVT::i32, Ptr,
++ DAG.getConstant(PtrIncr, MVT::i32));
++ SDValue Elem = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ElemVT,
++ Value, DAG.getConstant(i, MVT::i32));
++
++ Stores[i] = DAG.getNode(AMDGPUISD::REGISTER_STORE, DL, MVT::Other,
++ Chain, Elem, Ptr,
++ DAG.getTargetConstant(Channel, MVT::i32));
++ }
++ Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Stores, NumElemVT);
++ } else {
++ if (ValueVT == MVT::i8) {
++ Value = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Value);
++ }
++ Chain = DAG.getNode(AMDGPUISD::REGISTER_STORE, DL, MVT::Other, Chain, Value, Ptr,
++ DAG.getTargetConstant(0, MVT::i32)); // Channel
++ }
++
++ return Chain;
+}
+
-+/// XXX Only kernel functions are supported, so we can assume for now that
-+/// every function is a kernel function, but in the future we should use
-+/// separate calling conventions for kernel and non-kernel functions.
-+SDValue R600TargetLowering::LowerFormalArguments(
++// return (512 + (kc_bank << 12)
++static int
++ConstantAddressBlock(unsigned AddressSpace) {
++ switch (AddressSpace) {
++ case AMDGPUAS::CONSTANT_BUFFER_0:
++ return 512;
++ case AMDGPUAS::CONSTANT_BUFFER_1:
++ return 512 + 4096;
++ case AMDGPUAS::CONSTANT_BUFFER_2:
++ return 512 + 4096 * 2;
++ case AMDGPUAS::CONSTANT_BUFFER_3:
++ return 512 + 4096 * 3;
++ case AMDGPUAS::CONSTANT_BUFFER_4:
++ return 512 + 4096 * 4;
++ case AMDGPUAS::CONSTANT_BUFFER_5:
++ return 512 + 4096 * 5;
++ case AMDGPUAS::CONSTANT_BUFFER_6:
++ return 512 + 4096 * 6;
++ case AMDGPUAS::CONSTANT_BUFFER_7:
++ return 512 + 4096 * 7;
++ case AMDGPUAS::CONSTANT_BUFFER_8:
++ return 512 + 4096 * 8;
++ case AMDGPUAS::CONSTANT_BUFFER_9:
++ return 512 + 4096 * 9;
++ case AMDGPUAS::CONSTANT_BUFFER_10:
++ return 512 + 4096 * 10;
++ case AMDGPUAS::CONSTANT_BUFFER_11:
++ return 512 + 4096 * 11;
++ case AMDGPUAS::CONSTANT_BUFFER_12:
++ return 512 + 4096 * 12;
++ case AMDGPUAS::CONSTANT_BUFFER_13:
++ return 512 + 4096 * 13;
++ case AMDGPUAS::CONSTANT_BUFFER_14:
++ return 512 + 4096 * 14;
++ case AMDGPUAS::CONSTANT_BUFFER_15:
++ return 512 + 4096 * 15;
++ default:
++ return -1;
++ }
++}
++
++SDValue R600TargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const
++{
++ EVT VT = Op.getValueType();
++ DebugLoc DL = Op.getDebugLoc();
++ LoadSDNode *LoadNode = cast<LoadSDNode>(Op);
++ SDValue Chain = Op.getOperand(0);
++ SDValue Ptr = Op.getOperand(1);
++ SDValue LoweredLoad;
++
++ int ConstantBlock = ConstantAddressBlock(LoadNode->getAddressSpace());
++ if (ConstantBlock > -1) {
++ SDValue Result;
++ if (dyn_cast<ConstantExpr>(LoadNode->getSrcValue()) ||
++ dyn_cast<Constant>(LoadNode->getSrcValue())) {
++ SDValue Slots[4];
++ for (unsigned i = 0; i < 4; i++) {
++ // We want Const position encoded with the following formula :
++ // (((512 + (kc_bank << 12) + const_index) << 2) + chan)
++ // const_index is Ptr computed by llvm using an alignment of 16.
++ // Thus we add (((512 + (kc_bank << 12)) + chan ) * 4 here and
++ // then div by 4 at the ISel step
++ SDValue NewPtr = DAG.getNode(ISD::ADD, DL, Ptr.getValueType(), Ptr,
++ DAG.getConstant(4 * i + ConstantBlock * 16, MVT::i32));
++ Slots[i] = DAG.getNode(AMDGPUISD::CONST_ADDRESS, DL, MVT::i32, NewPtr);
++ }
++ Result = DAG.getNode(ISD::BUILD_VECTOR, DL, MVT::v4i32, Slots, 4);
++ } else {
++ // non constant ptr cant be folded, keeps it as a v4f32 load
++ Result = DAG.getNode(AMDGPUISD::CONST_ADDRESS, DL, MVT::v4i32,
++ DAG.getNode(ISD::SRL, DL, MVT::i32, Ptr, DAG.getConstant(4, MVT::i32))
++ );
++ }
++
++ if (!VT.isVector()) {
++ Result = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i32, Result,
++ DAG.getConstant(0, MVT::i32));
++ }
++
++ SDValue MergedValues[2] = {
++ Result,
++ Chain
++ };
++ return DAG.getMergeValues(MergedValues, 2, DL);
++ }
++
++ if (LoadNode->getAddressSpace() != AMDGPUAS::PRIVATE_ADDRESS) {
++ return SDValue();
++ }
++
++ // Lowering for indirect addressing
++ const MachineFunction &MF = DAG.getMachineFunction();
++ const AMDGPUFrameLowering *TFL = static_cast<const AMDGPUFrameLowering*>(
++ getTargetMachine().getFrameLowering());
++ unsigned StackWidth = TFL->getStackWidth(MF);
++
++ Ptr = stackPtrToRegIndex(Ptr, StackWidth, DAG);
++
++ if (VT.isVector()) {
++ unsigned NumElemVT = VT.getVectorNumElements();
++ EVT ElemVT = VT.getVectorElementType();
++ SDValue Loads[4];
++
++ assert(NumElemVT >= StackWidth && "Stack width cannot be greater than "
++ "vector width in load");
++
++ for (unsigned i = 0; i < NumElemVT; ++i) {
++ unsigned Channel, PtrIncr;
++ getStackAddress(StackWidth, i, Channel, PtrIncr);
++ Ptr = DAG.getNode(ISD::ADD, DL, MVT::i32, Ptr,
++ DAG.getConstant(PtrIncr, MVT::i32));
++ Loads[i] = DAG.getNode(AMDGPUISD::REGISTER_LOAD, DL, ElemVT,
++ Chain, Ptr,
++ DAG.getTargetConstant(Channel, MVT::i32),
++ Op.getOperand(2));
++ }
++ for (unsigned i = NumElemVT; i < 4; ++i) {
++ Loads[i] = DAG.getUNDEF(ElemVT);
++ }
++ EVT TargetVT = EVT::getVectorVT(*DAG.getContext(), ElemVT, 4);
++ LoweredLoad = DAG.getNode(ISD::BUILD_VECTOR, DL, TargetVT, Loads, 4);
++ } else {
++ LoweredLoad = DAG.getNode(AMDGPUISD::REGISTER_LOAD, DL, VT,
++ Chain, Ptr,
++ DAG.getTargetConstant(0, MVT::i32), // Channel
++ Op.getOperand(2));
++ }
++
++ SDValue Ops[2];
++ Ops[0] = LoweredLoad;
++ Ops[1] = Chain;
++
++ return DAG.getMergeValues(Ops, 2, DL);
++}
++
++SDValue R600TargetLowering::LowerFPOW(SDValue Op,
++ SelectionDAG &DAG) const {
++ DebugLoc DL = Op.getDebugLoc();
++ EVT VT = Op.getValueType();
++ SDValue LogBase = DAG.getNode(ISD::FLOG2, DL, VT, Op.getOperand(0));
++ SDValue MulLogBase = DAG.getNode(ISD::FMUL, DL, VT, Op.getOperand(1), LogBase);
++ return DAG.getNode(ISD::FEXP2, DL, VT, MulLogBase);
++}
++
++/// XXX Only kernel functions are supported, so we can assume for now that
++/// every function is a kernel function, but in the future we should use
++/// separate calling conventions for kernel and non-kernel functions.
++SDValue R600TargetLowering::LowerFormalArguments(
+ SDValue Chain,
+ CallingConv::ID CallConv,
+ bool isVarArg,
@@ -14435,7 +15014,7 @@ index 0000000..d6b9d90
+ AMDGPUAS::PARAM_I_ADDRESS);
+ SDValue Arg = DAG.getExtLoad(ISD::ZEXTLOAD, DL, VT, DAG.getRoot(),
+ DAG.getConstant(ParamOffsetBytes, MVT::i32),
-+ MachinePointerInfo(new Argument(PtrTy)),
++ MachinePointerInfo(UndefValue::get(PtrTy)),
+ ArgVT, false, false, ArgBytes);
+ InVals.push_back(Arg);
+ ParamOffsetBytes += ArgBytes;
@@ -14466,15 +15045,94 @@ index 0000000..d6b9d90
+ }
+ break;
+ }
++
++ // (i32 fp_to_sint (fneg (select_cc f32, f32, 1.0, 0.0 cc))) ->
++ // (i32 select_cc f32, f32, -1, 0 cc)
++ //
++ // Mesa's GLSL frontend generates the above pattern a lot and we can lower
++ // this to one of the SET*_DX10 instructions.
++ case ISD::FP_TO_SINT: {
++ SDValue FNeg = N->getOperand(0);
++ if (FNeg.getOpcode() != ISD::FNEG) {
++ return SDValue();
++ }
++ SDValue SelectCC = FNeg.getOperand(0);
++ if (SelectCC.getOpcode() != ISD::SELECT_CC ||
++ SelectCC.getOperand(0).getValueType() != MVT::f32 || // LHS
++ SelectCC.getOperand(2).getValueType() != MVT::f32 || // True
++ !isHWTrueValue(SelectCC.getOperand(2)) ||
++ !isHWFalseValue(SelectCC.getOperand(3))) {
++ return SDValue();
++ }
++
++ return DAG.getNode(ISD::SELECT_CC, N->getDebugLoc(), N->getValueType(0),
++ SelectCC.getOperand(0), // LHS
++ SelectCC.getOperand(1), // RHS
++ DAG.getConstant(-1, MVT::i32), // True
++ DAG.getConstant(0, MVT::i32), // Flase
++ SelectCC.getOperand(4)); // CC
++
++ break;
++ }
++ // Extract_vec (Build_vector) generated by custom lowering
++ // also needs to be customly combined
++ case ISD::EXTRACT_VECTOR_ELT: {
++ SDValue Arg = N->getOperand(0);
++ if (Arg.getOpcode() == ISD::BUILD_VECTOR) {
++ if (ConstantSDNode *Const = dyn_cast<ConstantSDNode>(N->getOperand(1))) {
++ unsigned Element = Const->getZExtValue();
++ return Arg->getOperand(Element);
++ }
++ }
++ if (Arg.getOpcode() == ISD::BITCAST &&
++ Arg.getOperand(0).getOpcode() == ISD::BUILD_VECTOR) {
++ if (ConstantSDNode *Const = dyn_cast<ConstantSDNode>(N->getOperand(1))) {
++ unsigned Element = Const->getZExtValue();
++ return DAG.getNode(ISD::BITCAST, N->getDebugLoc(), N->getVTList(),
++ Arg->getOperand(0).getOperand(Element));
++ }
++ }
++ }
++
++ case ISD::SELECT_CC: {
++ // fold selectcc (selectcc x, y, a, b, cc), b, a, b, seteq ->
++ // selectcc x, y, a, b, inv(cc)
++ SDValue LHS = N->getOperand(0);
++ if (LHS.getOpcode() != ISD::SELECT_CC) {
++ return SDValue();
++ }
++
++ SDValue RHS = N->getOperand(1);
++ SDValue True = N->getOperand(2);
++ SDValue False = N->getOperand(3);
++
++ if (LHS.getOperand(2).getNode() != True.getNode() ||
++ LHS.getOperand(3).getNode() != False.getNode() ||
++ RHS.getNode() != False.getNode() ||
++ cast<CondCodeSDNode>(N->getOperand(4))->get() != ISD::SETEQ) {
++ return SDValue();
++ }
++
++ ISD::CondCode CCOpcode = cast<CondCodeSDNode>(LHS->getOperand(4))->get();
++ CCOpcode = ISD::getSetCCInverse(
++ CCOpcode, LHS.getOperand(0).getValueType().isInteger());
++ return DAG.getSelectCC(N->getDebugLoc(),
++ LHS.getOperand(0),
++ LHS.getOperand(1),
++ LHS.getOperand(2),
++ LHS.getOperand(3),
++ CCOpcode);
++
++ }
+ }
+ return SDValue();
+}
diff --git a/lib/Target/R600/R600ISelLowering.h b/lib/Target/R600/R600ISelLowering.h
new file mode 100644
-index 0000000..2b954da
+index 0000000..afa3897
--- /dev/null
+++ b/lib/Target/R600/R600ISelLowering.h
-@@ -0,0 +1,72 @@
+@@ -0,0 +1,78 @@
+//===-- R600ISelLowering.h - R600 DAG Lowering Interface -*- C++ -*--------===//
+//
+// The LLVM Compiler Infrastructure
@@ -14540,7 +15198,13 @@ index 0000000..2b954da
+ SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerFPTOUINT(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerFPOW(SDValue Op, SelectionDAG &DAG) const;
-+
++ SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
++ SDValue LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const;
++
++ SDValue stackPtrToRegIndex(SDValue Ptr, unsigned StackWidth,
++ SelectionDAG &DAG) const;
++ void getStackAddress(unsigned StackWidth, unsigned ElemIdx,
++ unsigned &Channel, unsigned &PtrIncr) const;
+ bool isZero(SDValue Op) const;
+};
+
@@ -14549,10 +15213,10 @@ index 0000000..2b954da
+#endif // R600ISELLOWERING_H
diff --git a/lib/Target/R600/R600InstrInfo.cpp b/lib/Target/R600/R600InstrInfo.cpp
new file mode 100644
-index 0000000..70ed41aba
+index 0000000..31671ea
--- /dev/null
+++ b/lib/Target/R600/R600InstrInfo.cpp
-@@ -0,0 +1,665 @@
+@@ -0,0 +1,776 @@
+//===-- R600InstrInfo.cpp - R600 Instruction Information ------------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -14571,8 +15235,12 @@ index 0000000..70ed41aba
+#include "AMDGPUTargetMachine.h"
+#include "AMDGPUSubtarget.h"
+#include "R600Defines.h"
++#include "R600MachineFunctionInfo.h"
+#include "R600RegisterInfo.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
++#include "llvm/CodeGen/MachineFrameInfo.h"
++#include "llvm/CodeGen/MachineRegisterInfo.h"
++#include "llvm/Instructions.h"
+
+#define GET_INSTRINFO_CTOR
+#include "AMDGPUGenDFAPacketizer.inc"
@@ -14627,11 +15295,10 @@ index 0000000..70ed41aba
+MachineInstr * R600InstrInfo::getMovImmInstr(MachineFunction *MF,
+ unsigned DstReg, int64_t Imm) const {
+ MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::MOV), DebugLoc());
-+ MachineInstrBuilder MIB(*MF, MI);
-+ MIB.addReg(DstReg, RegState::Define);
-+ MIB.addReg(AMDGPU::ALU_LITERAL_X);
-+ MIB.addImm(Imm);
-+ MIB.addReg(0); // PREDICATE_BIT
++ MachineInstrBuilder(MI).addReg(DstReg, RegState::Define);
++ MachineInstrBuilder(MI).addReg(AMDGPU::ALU_LITERAL_X);
++ MachineInstrBuilder(MI).addImm(Imm);
++ MachineInstrBuilder(MI).addReg(0); // PREDICATE_BIT
+
+ return MI;
+}
@@ -14659,7 +15326,6 @@ index 0000000..70ed41aba
+ switch (Opcode) {
+ default: return false;
+ case AMDGPU::RETURN:
-+ case AMDGPU::RESERVE_REG:
+ return true;
+ }
+}
@@ -15005,8 +15671,7 @@ index 0000000..70ed41aba
+ if (PIdx != -1) {
+ MachineOperand &PMO = MI->getOperand(PIdx);
+ PMO.setReg(Pred[2].getReg());
-+ MachineInstrBuilder MIB(*MI->getParent()->getParent(), MI);
-+ MIB.addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit);
++ MachineInstrBuilder(MI).addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit);
+ return true;
+ }
+
@@ -15021,6 +15686,124 @@ index 0000000..70ed41aba
+ return 2;
+}
+
++int R600InstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const {
++ const MachineRegisterInfo &MRI = MF.getRegInfo();
++ const MachineFrameInfo *MFI = MF.getFrameInfo();
++ int Offset = 0;
++
++ if (MFI->getNumObjects() == 0) {
++ return -1;
++ }
++
++ if (MRI.livein_empty()) {
++ return 0;
++ }
++
++ for (MachineRegisterInfo::livein_iterator LI = MRI.livein_begin(),
++ LE = MRI.livein_end();
++ LI != LE; ++LI) {
++ Offset = std::max(Offset,
++ GET_REG_INDEX(RI.getEncodingValue(LI->first)));
++ }
++
++ return Offset + 1;
++}
++
++int R600InstrInfo::getIndirectIndexEnd(const MachineFunction &MF) const {
++ int Offset = 0;
++ const MachineFrameInfo *MFI = MF.getFrameInfo();
++
++ // Variable sized objects are not supported
++ assert(!MFI->hasVarSizedObjects());
++
++ if (MFI->getNumObjects() == 0) {
++ return -1;
++ }
++
++ Offset = TM.getFrameLowering()->getFrameIndexOffset(MF, -1);
++
++ return getIndirectIndexBegin(MF) + Offset;
++}
++
++std::vector<unsigned> R600InstrInfo::getIndirectReservedRegs(
++ const MachineFunction &MF) const {
++ const AMDGPUFrameLowering *TFL =
++ static_cast<const AMDGPUFrameLowering*>(TM.getFrameLowering());
++ std::vector<unsigned> Regs;
++
++ unsigned StackWidth = TFL->getStackWidth(MF);
++ int End = getIndirectIndexEnd(MF);
++
++ if (End == -1) {
++ return Regs;
++ }
++
++ for (int Index = getIndirectIndexBegin(MF); Index <= End; ++Index) {
++ unsigned SuperReg = AMDGPU::R600_Reg128RegClass.getRegister(Index);
++ Regs.push_back(SuperReg);
++ for (unsigned Chan = 0; Chan < StackWidth; ++Chan) {
++ unsigned Reg = AMDGPU::R600_TReg32RegClass.getRegister((4 * Index) + Chan);
++ Regs.push_back(Reg);
++ }
++ }
++ return Regs;
++}
++
++unsigned R600InstrInfo::calculateIndirectAddress(unsigned RegIndex,
++ unsigned Channel) const {
++ // XXX: Remove when we support a stack width > 2
++ assert(Channel == 0);
++ return RegIndex;
++}
++
++const TargetRegisterClass * R600InstrInfo::getIndirectAddrStoreRegClass(
++ unsigned SourceReg) const {
++ return &AMDGPU::R600_TReg32RegClass;
++}
++
++const TargetRegisterClass *R600InstrInfo::getIndirectAddrLoadRegClass() const {
++ return &AMDGPU::TRegMemRegClass;
++}
++
++MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB,
++ MachineBasicBlock::iterator I,
++ unsigned ValueReg, unsigned Address,
++ unsigned OffsetReg) const {
++ unsigned AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address);
++ MachineInstr *MOVA = buildDefaultInstruction(*MBB, I, AMDGPU::MOVA_INT_eg,
++ AMDGPU::AR_X, OffsetReg);
++ setImmOperand(MOVA, R600Operands::WRITE, 0);
++
++ MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV,
++ AddrReg, ValueReg)
++ .addReg(AMDGPU::AR_X, RegState::Implicit);
++ setImmOperand(Mov, R600Operands::DST_REL, 1);
++ return Mov;
++}
++
++MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB,
++ MachineBasicBlock::iterator I,
++ unsigned ValueReg, unsigned Address,
++ unsigned OffsetReg) const {
++ unsigned AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address);
++ MachineInstr *MOVA = buildDefaultInstruction(*MBB, I, AMDGPU::MOVA_INT_eg,
++ AMDGPU::AR_X,
++ OffsetReg);
++ setImmOperand(MOVA, R600Operands::WRITE, 0);
++ MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV,
++ ValueReg,
++ AddrReg)
++ .addReg(AMDGPU::AR_X, RegState::Implicit);
++ setImmOperand(Mov, R600Operands::SRC0_REL, 1);
++
++ return Mov;
++}
++
++const TargetRegisterClass *R600InstrInfo::getSuperIndirectRegClass() const {
++ return &AMDGPU::IndirectRegRegClass;
++}
++
++
+MachineInstrBuilder R600InstrInfo::buildDefaultInstruction(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator I,
+ unsigned Opcode,
@@ -15041,13 +15824,15 @@ index 0000000..70ed41aba
+ .addReg(Src0Reg) // $src0
+ .addImm(0) // $src0_neg
+ .addImm(0) // $src0_rel
-+ .addImm(0); // $src0_abs
++ .addImm(0) // $src0_abs
++ .addImm(-1); // $src0_sel
+
+ if (Src1Reg) {
+ MIB.addReg(Src1Reg) // $src1
+ .addImm(0) // $src1_neg
+ .addImm(0) // $src1_rel
-+ .addImm(0); // $src1_abs
++ .addImm(0) // $src1_abs
++ .addImm(-1); // $src1_sel
+ }
+
+ //XXX: The r600g finalizer expects this to be 1, once we've moved the
@@ -15076,16 +15861,6 @@ index 0000000..70ed41aba
+
+int R600InstrInfo::getOperandIdx(unsigned Opcode,
+ R600Operands::Ops Op) const {
-+ const static int OpTable[3][R600Operands::COUNT] = {
-+// W C S S S S S S S S
-+// R O D L S R R R S R R R S R R L P
-+// D U I M R A R C C C C C C C R C C A R I
-+// S E U T O E M C 0 0 0 C 1 1 1 C 2 2 S E M
-+// T M P E D L P 0 N R A 1 N R A 2 N R T D M
-+ {0,-1,-1, 1, 2, 3, 4, 5, 6, 7, 8,-1,-1,-1,-1,-1,-1,-1, 9,10,11},
-+ {0, 1, 2, 3, 4 ,5 ,6 ,7, 8, 9,10,11,12,-1,-1,-1,13,14,15,16,17},
-+ {0,-1,-1,-1,-1, 1, 2, 3, 4, 5,-1, 6, 7, 8,-1, 9,10,11,12,13,14}
-+ };
+ unsigned TargetFlags = get(Opcode).TSFlags;
+ unsigned OpTableIdx;
+
@@ -15111,7 +15886,7 @@ index 0000000..70ed41aba
+ OpTableIdx = 2;
+ }
+
-+ return OpTable[OpTableIdx][Op];
++ return R600Operands::ALUOpTable[OpTableIdx][Op];
+}
+
+void R600InstrInfo::setImmOperand(MachineInstr *MI, R600Operands::Ops Op,
@@ -15220,10 +15995,10 @@ index 0000000..70ed41aba
+}
diff --git a/lib/Target/R600/R600InstrInfo.h b/lib/Target/R600/R600InstrInfo.h
new file mode 100644
-index 0000000..6bb0ca9
+index 0000000..278fad1
--- /dev/null
+++ b/lib/Target/R600/R600InstrInfo.h
-@@ -0,0 +1,169 @@
+@@ -0,0 +1,201 @@
+//===-- R600InstrInfo.h - R600 Instruction Info Interface -------*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -15340,6 +16115,38 @@ index 0000000..6bb0ca9
+ virtual int getInstrLatency(const InstrItineraryData *ItinData,
+ SDNode *Node) const { return 1;}
+
++ /// \returns a list of all the registers that may be accesed using indirect
++ /// addressing.
++ std::vector<unsigned> getIndirectReservedRegs(const MachineFunction &MF) const;
++
++ virtual int getIndirectIndexBegin(const MachineFunction &MF) const;
++
++ virtual int getIndirectIndexEnd(const MachineFunction &MF) const;
++
++
++ virtual unsigned calculateIndirectAddress(unsigned RegIndex,
++ unsigned Channel) const;
++
++ virtual const TargetRegisterClass *getIndirectAddrStoreRegClass(
++ unsigned SourceReg) const;
++
++ virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const;
++
++ virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB,
++ MachineBasicBlock::iterator I,
++ unsigned ValueReg, unsigned Address,
++ unsigned OffsetReg) const;
++
++ virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB,
++ MachineBasicBlock::iterator I,
++ unsigned ValueReg, unsigned Address,
++ unsigned OffsetReg) const;
++
++ virtual const TargetRegisterClass *getSuperIndirectRegClass() const;
++
++
++ ///buildDefaultInstruction - This function returns a MachineInstr with
++ /// all the instruction modifiers initialized to their default values.
+ /// You can use this function to avoid manually specifying each instruction
+ /// modifier operand when building a new instruction.
+ ///
@@ -15395,10 +16202,10 @@ index 0000000..6bb0ca9
+#endif // R600INSTRINFO_H_
diff --git a/lib/Target/R600/R600Instructions.td b/lib/Target/R600/R600Instructions.td
new file mode 100644
-index 0000000..64bab18
+index 0000000..409da07
--- /dev/null
+++ b/lib/Target/R600/R600Instructions.td
-@@ -0,0 +1,1724 @@
+@@ -0,0 +1,1976 @@
+//===-- R600Instructions.td - R600 Instruction defs -------*- tablegen -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -15471,6 +16278,11 @@ index 0000000..64bab18
+ let PrintMethod = PM;
+}
+
++// src_sel for ALU src operands, see also ALU_CONST, ALU_PARAM registers
++def SEL : OperandWithDefaultOps <i32, (ops (i32 -1))> {
++ let PrintMethod = "printSel";
++}
++
+def LITERAL : InstFlag<"printLiteral">;
+
+def WRITE : InstFlag <"printWrite", 1>;
@@ -15487,9 +16299,16 @@ index 0000000..64bab18
+// default to 0.
+def LAST : InstFlag<"printLast", 1>;
+
++def FRAMEri : Operand<iPTR> {
++ let MIOperandInfo = (ops R600_Reg32:$ptr, i32imm:$index);
++}
++
+def ADDRParam : ComplexPattern<i32, 2, "SelectADDRParam", [], []>;
+def ADDRDWord : ComplexPattern<i32, 1, "SelectADDRDWord", [], []>;
+def ADDRVTX_READ : ComplexPattern<i32, 2, "SelectADDRVTX_READ", [], []>;
++def ADDRGA_CONST_OFFSET : ComplexPattern<i32, 1, "SelectGlobalValueConstantOffset", [], []>;
++def ADDRGA_VAR_OFFSET : ComplexPattern<i32, 2, "SelectGlobalValueVariableOffset", [], []>;
++def ADDRIndirect : ComplexPattern<iPTR, 2, "SelectADDRIndirect", [], []>;
+
+class R600ALU_Word0 {
+ field bits<32> Word0;
@@ -15574,6 +16393,55 @@ index 0000000..64bab18
+ let Word1{17-13} = alu_inst;
+}
+
++class VTX_WORD0 {
++ field bits<32> Word0;
++ bits<7> SRC_GPR;
++ bits<5> VC_INST;
++ bits<2> FETCH_TYPE;
++ bits<1> FETCH_WHOLE_QUAD;
++ bits<8> BUFFER_ID;
++ bits<1> SRC_REL;
++ bits<2> SRC_SEL_X;
++ bits<6> MEGA_FETCH_COUNT;
++
++ let Word0{4-0} = VC_INST;
++ let Word0{6-5} = FETCH_TYPE;
++ let Word0{7} = FETCH_WHOLE_QUAD;
++ let Word0{15-8} = BUFFER_ID;
++ let Word0{22-16} = SRC_GPR;
++ let Word0{23} = SRC_REL;
++ let Word0{25-24} = SRC_SEL_X;
++ let Word0{31-26} = MEGA_FETCH_COUNT;
++}
++
++class VTX_WORD1_GPR {
++ field bits<32> Word1;
++ bits<7> DST_GPR;
++ bits<1> DST_REL;
++ bits<3> DST_SEL_X;
++ bits<3> DST_SEL_Y;
++ bits<3> DST_SEL_Z;
++ bits<3> DST_SEL_W;
++ bits<1> USE_CONST_FIELDS;
++ bits<6> DATA_FORMAT;
++ bits<2> NUM_FORMAT_ALL;
++ bits<1> FORMAT_COMP_ALL;
++ bits<1> SRF_MODE_ALL;
++
++ let Word1{6-0} = DST_GPR;
++ let Word1{7} = DST_REL;
++ let Word1{8} = 0; // Reserved
++ let Word1{11-9} = DST_SEL_X;
++ let Word1{14-12} = DST_SEL_Y;
++ let Word1{17-15} = DST_SEL_Z;
++ let Word1{20-18} = DST_SEL_W;
++ let Word1{21} = USE_CONST_FIELDS;
++ let Word1{27-22} = DATA_FORMAT;
++ let Word1{29-28} = NUM_FORMAT_ALL;
++ let Word1{30} = FORMAT_COMP_ALL;
++ let Word1{31} = SRF_MODE_ALL;
++}
++
+/*
+XXX: R600 subtarget uses a slightly different encoding than the other
+subtargets. We currently handle this in R600MCCodeEmitter, but we may
@@ -15615,11 +16483,11 @@ index 0000000..64bab18
+ InstR600 <0,
+ (outs R600_Reg32:$dst),
+ (ins WRITE:$write, OMOD:$omod, REL:$dst_rel, CLAMP:$clamp,
-+ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs,
++ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs, SEL:$src0_sel,
+ LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal),
+ !strconcat(opName,
+ "$clamp $dst$write$dst_rel$omod, "
-+ "$src0_neg$src0_abs$src0$src0_abs$src0_rel, "
++ "$src0_neg$src0_abs$src0$src0_sel$src0_abs$src0_rel, "
+ "$literal $pred_sel$last"),
+ pattern,
+ itin>,
@@ -15655,13 +16523,13 @@ index 0000000..64bab18
+ (outs R600_Reg32:$dst),
+ (ins UEM:$update_exec_mask, UP:$update_pred, WRITE:$write,
+ OMOD:$omod, REL:$dst_rel, CLAMP:$clamp,
-+ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs,
-+ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, ABS:$src1_abs,
++ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs, SEL:$src0_sel,
++ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, ABS:$src1_abs, SEL:$src1_sel,
+ LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal),
+ !strconcat(opName,
+ "$clamp $update_exec_mask$update_pred$dst$write$dst_rel$omod, "
-+ "$src0_neg$src0_abs$src0$src0_abs$src0_rel, "
-+ "$src1_neg$src1_abs$src1$src1_abs$src1_rel, "
++ "$src0_neg$src0_abs$src0$src0_sel$src0_abs$src0_rel, "
++ "$src1_neg$src1_abs$src1$src1_sel$src1_abs$src1_rel, "
+ "$literal $pred_sel$last"),
+ pattern,
+ itin>,
@@ -15692,14 +16560,14 @@ index 0000000..64bab18
+ InstR600 <0,
+ (outs R600_Reg32:$dst),
+ (ins REL:$dst_rel, CLAMP:$clamp,
-+ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel,
-+ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel,
-+ R600_Reg32:$src2, NEG:$src2_neg, REL:$src2_rel,
++ R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, SEL:$src0_sel,
++ R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, SEL:$src1_sel,
++ R600_Reg32:$src2, NEG:$src2_neg, REL:$src2_rel, SEL:$src2_sel,
+ LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal),
+ !strconcat(opName, "$clamp $dst$dst_rel, "
-+ "$src0_neg$src0$src0_rel, "
-+ "$src1_neg$src1$src1_rel, "
-+ "$src2_neg$src2$src2_rel, "
++ "$src0_neg$src0$src0_sel$src0_rel, "
++ "$src1_neg$src1$src1_sel$src1_rel, "
++ "$src2_neg$src2$src2_sel$src2_rel, "
+ "$literal $pred_sel$last"),
+ pattern,
+ itin>,
@@ -15743,6 +16611,27 @@ index 0000000..64bab18
+ }]
+>;
+
++def TEX_RECT : PatLeaf<
++ (imm),
++ [{uint32_t TType = (uint32_t)N->getZExtValue();
++ return TType == 5;
++ }]
++>;
++
++def TEX_ARRAY : PatLeaf<
++ (imm),
++ [{uint32_t TType = (uint32_t)N->getZExtValue();
++ return TType == 9 || TType == 10 || TType == 15 || TType == 16;
++ }]
++>;
++
++def TEX_SHADOW_ARRAY : PatLeaf<
++ (imm),
++ [{uint32_t TType = (uint32_t)N->getZExtValue();
++ return TType == 11 || TType == 12 || TType == 17;
++ }]
++>;
++
+class EG_CF_RAT <bits <8> cf_inst, bits <6> rat_inst, bits<4> rat_id, dag outs,
+ dag ins, string asm, list<dag> pattern> :
+ InstR600ISA <outs, ins, asm, pattern> {
@@ -15815,32 +16704,35 @@ index 0000000..64bab18
+ "Subtarget.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX">;
+
+//===----------------------------------------------------------------------===//
-+// Interpolation Instructions
++// R600 SDNodes
+//===----------------------------------------------------------------------===//
+
-+def INTERP: SDNode<"AMDGPUISD::INTERP",
-+ SDTypeProfile<1, 2, [SDTCisFP<0>, SDTCisInt<1>, SDTCisInt<2>]>
-+ >;
-+
-+def INTERP_P0: SDNode<"AMDGPUISD::INTERP_P0",
-+ SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisInt<1>]>
-+ >;
++def INTERP_PAIR_XY : AMDGPUShaderInst <
++ (outs R600_TReg32_X:$dst0, R600_TReg32_Y:$dst1),
++ (ins i32imm:$src0, R600_Reg32:$src1, R600_Reg32:$src2),
++ "INTERP_PAIR_XY $src0 $src1 $src2 : $dst0 dst1",
++ []>;
++
++def INTERP_PAIR_ZW : AMDGPUShaderInst <
++ (outs R600_TReg32_Z:$dst0, R600_TReg32_W:$dst1),
++ (ins i32imm:$src0, R600_Reg32:$src1, R600_Reg32:$src2),
++ "INTERP_PAIR_ZW $src0 $src1 $src2 : $dst0 dst1",
++ []>;
++
++def CONST_ADDRESS: SDNode<"AMDGPUISD::CONST_ADDRESS",
++ SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisPtrTy<1>]>,
++ [SDNPMayLoad]
++>;
+
-+let usesCustomInserter = 1 in {
-+def input_perspective : AMDGPUShaderInst <
-+ (outs R600_Reg128:$dst),
-+ (ins i32imm:$src0, i32imm:$src1),
-+ "input_perspective $src0 $src1 : dst",
-+ [(set R600_Reg128:$dst, (INTERP (i32 imm:$src0), (i32 imm:$src1)))]>;
-+} // End usesCustomInserter = 1
++//===----------------------------------------------------------------------===//
++// Interpolation Instructions
++//===----------------------------------------------------------------------===//
+
-+def input_constant : AMDGPUShaderInst <
++def INTERP_VEC_LOAD : AMDGPUShaderInst <
+ (outs R600_Reg128:$dst),
-+ (ins i32imm:$src),
-+ "input_perspective $src : dst",
-+ [(set R600_Reg128:$dst, (INTERP_P0 (i32 imm:$src)))]>;
-+
-+
++ (ins i32imm:$src0),
++ "INTERP_LOAD $src0 : $dst",
++ []>;
+
+def INTERP_XY : R600_2OP <0xD6, "INTERP_XY", []> {
+ let bank_swizzle = 5;
@@ -15908,19 +16800,24 @@ index 0000000..64bab18
+multiclass ExportPattern<Instruction ExportInst, bits<8> cf_inst> {
+ def : Pat<(int_R600_store_pixel_depth R600_Reg32:$reg),
+ (ExportInst
-+ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sel_x),
++ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sub0),
+ 0, 61, 0, 7, 7, 7, cf_inst, 0)
+ >;
+
+ def : Pat<(int_R600_store_pixel_stencil R600_Reg32:$reg),
+ (ExportInst
-+ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sel_x),
++ (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sub0),
+ 0, 61, 7, 0, 7, 7, cf_inst, 0)
+ >;
+
-+ def : Pat<(int_R600_store_pixel_dummy),
++ def : Pat<(int_R600_store_dummy (i32 imm:$type)),
++ (ExportInst
++ (v4f32 (IMPLICIT_DEF)), imm:$type, 0, 7, 7, 7, 7, cf_inst, 0)
++ >;
++
++ def : Pat<(int_R600_store_dummy 1),
+ (ExportInst
-+ (v4f32 (IMPLICIT_DEF)), 0, 0, 7, 7, 7, 7, cf_inst, 0)
++ (v4f32 (IMPLICIT_DEF)), 1, 60, 7, 7, 7, 7, cf_inst, 0)
+ >;
+
+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 0),
@@ -15928,29 +16825,40 @@ index 0000000..64bab18
+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
+ 0, 1, 2, 3, cf_inst, 0)
+ >;
++ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 1),
++ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm)),
++ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++ 0, 1, 2, 3, cf_inst, 0)
++ >;
++
++ def : Pat<(int_R600_store_swizzle (v4f32 R600_Reg128:$src), imm:$arraybase,
++ imm:$type),
++ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++ 0, 1, 2, 3, cf_inst, 0)
++ >;
+}
+
+multiclass SteamOutputExportPattern<Instruction ExportInst,
+ bits<8> buf0inst, bits<8> buf1inst, bits<8> buf2inst, bits<8> buf3inst> {
+// Stream0
-+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 1),
-+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
-+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
++ (i32 imm:$arraybase), (i32 0), (i32 imm:$mask)),
++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
+ 4095, imm:$mask, buf0inst, 0)>;
+// Stream1
-+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 2),
-+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
-+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
++ (i32 imm:$arraybase), (i32 1), (i32 imm:$mask)),
++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
+ 4095, imm:$mask, buf1inst, 0)>;
+// Stream2
-+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 3),
-+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
-+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
++ (i32 imm:$arraybase), (i32 2), (i32 imm:$mask)),
++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
+ 4095, imm:$mask, buf2inst, 0)>;
+// Stream3
-+ def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 4),
-+ (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
-+ (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++ def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
++ (i32 imm:$arraybase), (i32 3), (i32 imm:$mask)),
++ (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
+ 4095, imm:$mask, buf3inst, 0)>;
+}
+
@@ -16025,6 +16933,34 @@ index 0000000..64bab18
+ COND_NE))]
+>;
+
++def SETE_DX10 : R600_2OP <
++ 0xC, "SETE_DX10",
++ [(set R600_Reg32:$dst,
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
++ COND_EQ))]
++>;
++
++def SETGT_DX10 : R600_2OP <
++ 0xD, "SETGT_DX10",
++ [(set R600_Reg32:$dst,
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
++ COND_GT))]
++>;
++
++def SETGE_DX10 : R600_2OP <
++ 0xE, "SETGE_DX10",
++ [(set R600_Reg32:$dst,
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
++ COND_GE))]
++>;
++
++def SETNE_DX10 : R600_2OP <
++ 0xF, "SETNE_DX10",
++ [(set R600_Reg32:$dst,
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
++ COND_NE))]
++>;
++
+def FRACT : R600_1OP_Helper <0x10, "FRACT", AMDGPUfract>;
+def TRUNC : R600_1OP_Helper <0x11, "TRUNC", int_AMDGPU_trunc>;
+def CEIL : R600_1OP_Helper <0x12, "CEIL", fceil>;
@@ -16085,7 +17021,7 @@ index 0000000..64bab18
+>;
+
+def SETGT_INT : R600_2OP <
-+ 0x3B, "SGT_INT",
++ 0x3B, "SETGT_INT",
+ [(set (i32 R600_Reg32:$dst),
+ (selectcc (i32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETGT))]
+>;
@@ -16539,6 +17475,10 @@ index 0000000..64bab18
+ defm DOT4_eg : DOT4_Common<0xBE>;
+ defm CUBE_eg : CUBE_Common<0xC0>;
+
++let hasSideEffects = 1 in {
++ def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", []>;
++}
++
+ def TGSI_LIT_Z_eg : TGSI_LIT_Z_Common<MUL_LIT_eg, LOG_CLAMPED_eg, EXP_IEEE_eg>;
+
+ def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {
@@ -16629,37 +17569,30 @@ index 0000000..64bab18
+>;
+
+class VTX_READ_eg <string name, bits<8> buffer_id, dag outs, list<dag> pattern>
-+ : InstR600ISA <outs, (ins MEMxi:$ptr), name#" $dst, $ptr", pattern> {
-+
-+ // Operands
-+ bits<7> DST_GPR;
-+ bits<7> SRC_GPR;
++ : InstR600ISA <outs, (ins MEMxi:$ptr), name#" $dst, $ptr", pattern>,
++ VTX_WORD1_GPR, VTX_WORD0 {
+
+ // Static fields
-+ bits<5> VC_INST = 0;
-+ bits<2> FETCH_TYPE = 2;
-+ bits<1> FETCH_WHOLE_QUAD = 0;
-+ bits<8> BUFFER_ID = buffer_id;
-+ bits<1> SRC_REL = 0;
++ let VC_INST = 0;
++ let FETCH_TYPE = 2;
++ let FETCH_WHOLE_QUAD = 0;
++ let BUFFER_ID = buffer_id;
++ let SRC_REL = 0;
+ // XXX: We can infer this field based on the SRC_GPR. This would allow us
+ // to store vertex addresses in any channel, not just X.
-+ bits<2> SRC_SEL_X = 0;
-+ bits<6> MEGA_FETCH_COUNT;
-+ bits<1> DST_REL = 0;
-+ bits<3> DST_SEL_X;
-+ bits<3> DST_SEL_Y;
-+ bits<3> DST_SEL_Z;
-+ bits<3> DST_SEL_W;
++ let SRC_SEL_X = 0;
++ let DST_REL = 0;
+ // The docs say that if this bit is set, then DATA_FORMAT, NUM_FORMAT_ALL,
+ // FORMAT_COMP_ALL, SRF_MODE_ALL, and ENDIAN_SWAP fields will be ignored,
+ // however, based on my testing if USE_CONST_FIELDS is set, then all
+ // these fields need to be set to 0.
-+ bits<1> USE_CONST_FIELDS = 0;
-+ bits<6> DATA_FORMAT;
-+ bits<2> NUM_FORMAT_ALL = 1;
-+ bits<1> FORMAT_COMP_ALL = 0;
-+ bits<1> SRF_MODE_ALL = 0;
++ let USE_CONST_FIELDS = 0;
++ let NUM_FORMAT_ALL = 1;
++ let FORMAT_COMP_ALL = 0;
++ let SRF_MODE_ALL = 0;
+
++ let Inst{31-0} = Word0;
++ let Inst{63-32} = Word1;
+ // LLVM can only encode 64-bit instructions, so these fields are manually
+ // encoded in R600CodeEmitter
+ //
@@ -16670,29 +17603,7 @@ index 0000000..64bab18
+ // bits<1> ALT_CONST = 0;
+ // bits<2> BUFFER_INDEX_MODE = 0;
+
-+ // VTX_WORD0
-+ let Inst{4-0} = VC_INST;
-+ let Inst{6-5} = FETCH_TYPE;
-+ let Inst{7} = FETCH_WHOLE_QUAD;
-+ let Inst{15-8} = BUFFER_ID;
-+ let Inst{22-16} = SRC_GPR;
-+ let Inst{23} = SRC_REL;
-+ let Inst{25-24} = SRC_SEL_X;
-+ let Inst{31-26} = MEGA_FETCH_COUNT;
-+
-+ // VTX_WORD1_GPR
-+ let Inst{38-32} = DST_GPR;
-+ let Inst{39} = DST_REL;
-+ let Inst{40} = 0; // Reserved
-+ let Inst{43-41} = DST_SEL_X;
-+ let Inst{46-44} = DST_SEL_Y;
-+ let Inst{49-47} = DST_SEL_Z;
-+ let Inst{52-50} = DST_SEL_W;
-+ let Inst{53} = USE_CONST_FIELDS;
-+ let Inst{59-54} = DATA_FORMAT;
-+ let Inst{61-60} = NUM_FORMAT_ALL;
-+ let Inst{62} = FORMAT_COMP_ALL;
-+ let Inst{63} = SRF_MODE_ALL;
++
+
+ // VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding
+ // is done in R600CodeEmitter
@@ -16788,6 +17699,10 @@ index 0000000..64bab18
+ [(set (i32 R600_TReg32_X:$dst), (load_param ADDRVTX_READ:$ptr))]
+>;
+
++def VTX_READ_PARAM_128_eg : VTX_READ_128_eg <0,
++ [(set (v4i32 R600_Reg128:$dst), (load_param ADDRVTX_READ:$ptr))]
++>;
++
+//===----------------------------------------------------------------------===//
+// VTX Read from global memory space
+//===----------------------------------------------------------------------===//
@@ -16818,6 +17733,12 @@ index 0000000..64bab18
+
+}
+
++//===----------------------------------------------------------------------===//
++// Regist loads and stores - for indirect addressing
++//===----------------------------------------------------------------------===//
++
++defm R600_ : RegisterLoadStore <R600_Reg32, FRAMEri, ADDRIndirect>;
++
+let Predicates = [isCayman] in {
+
+let isVector = 1 in {
@@ -16877,6 +17798,7 @@ index 0000000..64bab18
+ (ins R600_Reg32:$src0, i32imm:$src1, i32imm:$flags),
+ "", [], NullALU> {
+ let FlagOperandIdx = 3;
++ let isTerminator = 1;
+}
+
+let isTerminator = 1, isBranch = 1, isBarrier = 1 in {
@@ -16903,19 +17825,6 @@ index 0000000..64bab18
+
+} // End mayLoad = 0, mayStore = 0, hasSideEffects = 1
+
-+def R600_LOAD_CONST : AMDGPUShaderInst <
-+ (outs R600_Reg32:$dst),
-+ (ins i32imm:$src0),
-+ "R600_LOAD_CONST $dst, $src0",
-+ [(set R600_Reg32:$dst, (int_AMDGPU_load_const imm:$src0))]
-+>;
-+
-+def RESERVE_REG : AMDGPUShaderInst <
-+ (outs),
-+ (ins i32imm:$src),
-+ "RESERVE_REG $src",
-+ [(int_AMDGPU_reserve_reg imm:$src)]
-+>;
+
+def TXD: AMDGPUShaderInst <
+ (outs R600_Reg128:$dst),
@@ -16946,22 +17855,148 @@ index 0000000..64bab18
+ "RETURN", [(IL_retflag)]>;
+}
+
-+//===--------------------------------------------------------------------===//
-+// Instructions support
-+//===--------------------------------------------------------------------===//
-+//===---------------------------------------------------------------------===//
-+// Custom Inserter for Branches and returns, this eventually will be a
-+// seperate pass
-+//===---------------------------------------------------------------------===//
-+let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1 in {
-+ def BRANCH : ILFormat<(outs), (ins brtarget:$target),
-+ "; Pseudo unconditional branch instruction",
-+ [(br bb:$target)]>;
-+ defm BRANCH_COND : BranchConditional<IL_brcond>;
-+}
+
-+//===---------------------------------------------------------------------===//
-+// Flow and Program control Instructions
++//===----------------------------------------------------------------------===//
++// Constant Buffer Addressing Support
++//===----------------------------------------------------------------------===//
++
++let isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU" in {
++def CONST_COPY : Instruction {
++ let OutOperandList = (outs R600_Reg32:$dst);
++ let InOperandList = (ins i32imm:$src);
++ let Pattern = [(set R600_Reg32:$dst, (CONST_ADDRESS ADDRGA_CONST_OFFSET:$src))];
++ let AsmString = "CONST_COPY";
++ let neverHasSideEffects = 1;
++ let isAsCheapAsAMove = 1;
++ let Itinerary = NullALU;
++}
++} // end isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU"
++
++def TEX_VTX_CONSTBUF :
++ InstR600ISA <(outs R600_Reg128:$dst), (ins MEMxi:$ptr), "VTX_READ_eg $dst, $ptr",
++ [(set R600_Reg128:$dst, (CONST_ADDRESS ADDRGA_VAR_OFFSET:$ptr))]>,
++ VTX_WORD1_GPR, VTX_WORD0 {
++
++ let VC_INST = 0;
++ let FETCH_TYPE = 2;
++ let FETCH_WHOLE_QUAD = 0;
++ let BUFFER_ID = 0;
++ let SRC_REL = 0;
++ let SRC_SEL_X = 0;
++ let DST_REL = 0;
++ let USE_CONST_FIELDS = 0;
++ let NUM_FORMAT_ALL = 2;
++ let FORMAT_COMP_ALL = 1;
++ let SRF_MODE_ALL = 1;
++ let MEGA_FETCH_COUNT = 16;
++ let DST_SEL_X = 0;
++ let DST_SEL_Y = 1;
++ let DST_SEL_Z = 2;
++ let DST_SEL_W = 3;
++ let DATA_FORMAT = 35;
++
++ let Inst{31-0} = Word0;
++ let Inst{63-32} = Word1;
++
++// LLVM can only encode 64-bit instructions, so these fields are manually
++// encoded in R600CodeEmitter
++//
++// bits<16> OFFSET;
++// bits<2> ENDIAN_SWAP = 0;
++// bits<1> CONST_BUF_NO_STRIDE = 0;
++// bits<1> MEGA_FETCH = 0;
++// bits<1> ALT_CONST = 0;
++// bits<2> BUFFER_INDEX_MODE = 0;
++
++
++
++// VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding
++// is done in R600CodeEmitter
++//
++// Inst{79-64} = OFFSET;
++// Inst{81-80} = ENDIAN_SWAP;
++// Inst{82} = CONST_BUF_NO_STRIDE;
++// Inst{83} = MEGA_FETCH;
++// Inst{84} = ALT_CONST;
++// Inst{86-85} = BUFFER_INDEX_MODE;
++// Inst{95-86} = 0; Reserved
++
++// VTX_WORD3 (Padding)
++//
++// Inst{127-96} = 0;
++}
++
++def TEX_VTX_TEXBUF:
++ InstR600ISA <(outs R600_Reg128:$dst), (ins MEMxi:$ptr, i32imm:$BUFFER_ID), "TEX_VTX_EXPLICIT_READ $dst, $ptr",
++ [(set R600_Reg128:$dst, (int_R600_load_texbuf ADDRGA_VAR_OFFSET:$ptr, imm:$BUFFER_ID))]>,
++VTX_WORD1_GPR, VTX_WORD0 {
++
++let VC_INST = 0;
++let FETCH_TYPE = 2;
++let FETCH_WHOLE_QUAD = 0;
++let SRC_REL = 0;
++let SRC_SEL_X = 0;
++let DST_REL = 0;
++let USE_CONST_FIELDS = 1;
++let NUM_FORMAT_ALL = 0;
++let FORMAT_COMP_ALL = 0;
++let SRF_MODE_ALL = 1;
++let MEGA_FETCH_COUNT = 16;
++let DST_SEL_X = 0;
++let DST_SEL_Y = 1;
++let DST_SEL_Z = 2;
++let DST_SEL_W = 3;
++let DATA_FORMAT = 0;
++
++let Inst{31-0} = Word0;
++let Inst{63-32} = Word1;
++
++// LLVM can only encode 64-bit instructions, so these fields are manually
++// encoded in R600CodeEmitter
++//
++// bits<16> OFFSET;
++// bits<2> ENDIAN_SWAP = 0;
++// bits<1> CONST_BUF_NO_STRIDE = 0;
++// bits<1> MEGA_FETCH = 0;
++// bits<1> ALT_CONST = 0;
++// bits<2> BUFFER_INDEX_MODE = 0;
++
++
++
++// VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding
++// is done in R600CodeEmitter
++//
++// Inst{79-64} = OFFSET;
++// Inst{81-80} = ENDIAN_SWAP;
++// Inst{82} = CONST_BUF_NO_STRIDE;
++// Inst{83} = MEGA_FETCH;
++// Inst{84} = ALT_CONST;
++// Inst{86-85} = BUFFER_INDEX_MODE;
++// Inst{95-86} = 0; Reserved
++
++// VTX_WORD3 (Padding)
++//
++// Inst{127-96} = 0;
++}
++
++
++
++//===--------------------------------------------------------------------===//
++// Instructions support
++//===--------------------------------------------------------------------===//
++//===---------------------------------------------------------------------===//
++// Custom Inserter for Branches and returns, this eventually will be a
++// seperate pass
++//===---------------------------------------------------------------------===//
++let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1 in {
++ def BRANCH : ILFormat<(outs), (ins brtarget:$target),
++ "; Pseudo unconditional branch instruction",
++ [(br bb:$target)]>;
++ defm BRANCH_COND : BranchConditional<IL_brcond>;
++}
++
++//===---------------------------------------------------------------------===//
++// Flow and Program control Instructions
+//===---------------------------------------------------------------------===//
+let isTerminator=1 in {
+ def SWITCH : ILFormat< (outs), (ins GPRI32:$src),
@@ -17045,6 +18080,18 @@ index 0000000..64bab18
+ (SGE R600_Reg32:$src1, R600_Reg32:$src0)
+>;
+
++// SETGT_DX10 reverse args
++def : Pat <
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, COND_LT),
++ (SETGT_DX10 R600_Reg32:$src1, R600_Reg32:$src0)
++>;
++
++// SETGE_DX10 reverse args
++def : Pat <
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, COND_LE),
++ (SETGE_DX10 R600_Reg32:$src1, R600_Reg32:$src0)
++>;
++
+// SETGT_INT reverse args
+def : Pat <
+ (selectcc (i32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETLT),
@@ -17083,31 +18130,43 @@ index 0000000..64bab18
+ (SETE R600_Reg32:$src0, R600_Reg32:$src1)
+>;
+
++//SETE_DX10 - 'true if ordered'
++def : Pat <
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETO),
++ (SETE_DX10 R600_Reg32:$src0, R600_Reg32:$src1)
++>;
++
+//SNE - 'true if unordered'
+def : Pat <
+ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, FP_ONE, FP_ZERO, SETUO),
+ (SNE R600_Reg32:$src0, R600_Reg32:$src1)
+>;
+
-+def : Extract_Element <f32, v4f32, R600_Reg128, 0, sel_x>;
-+def : Extract_Element <f32, v4f32, R600_Reg128, 1, sel_y>;
-+def : Extract_Element <f32, v4f32, R600_Reg128, 2, sel_z>;
-+def : Extract_Element <f32, v4f32, R600_Reg128, 3, sel_w>;
++//SETNE_DX10 - 'true if ordered'
++def : Pat <
++ (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETUO),
++ (SETNE_DX10 R600_Reg32:$src0, R600_Reg32:$src1)
++>;
+
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 0, sel_x>;
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 1, sel_y>;
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 2, sel_z>;
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 3, sel_w>;
++def : Extract_Element <f32, v4f32, R600_Reg128, 0, sub0>;
++def : Extract_Element <f32, v4f32, R600_Reg128, 1, sub1>;
++def : Extract_Element <f32, v4f32, R600_Reg128, 2, sub2>;
++def : Extract_Element <f32, v4f32, R600_Reg128, 3, sub3>;
+
-+def : Extract_Element <i32, v4i32, R600_Reg128, 0, sel_x>;
-+def : Extract_Element <i32, v4i32, R600_Reg128, 1, sel_y>;
-+def : Extract_Element <i32, v4i32, R600_Reg128, 2, sel_z>;
-+def : Extract_Element <i32, v4i32, R600_Reg128, 3, sel_w>;
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 0, sub0>;
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 1, sub1>;
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 2, sub2>;
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 3, sub3>;
+
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 0, sel_x>;
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 1, sel_y>;
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 2, sel_z>;
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 3, sel_w>;
++def : Extract_Element <i32, v4i32, R600_Reg128, 0, sub0>;
++def : Extract_Element <i32, v4i32, R600_Reg128, 1, sub1>;
++def : Extract_Element <i32, v4i32, R600_Reg128, 2, sub2>;
++def : Extract_Element <i32, v4i32, R600_Reg128, 3, sub3>;
++
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 0, sub0>;
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 1, sub1>;
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 2, sub2>;
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 3, sub3>;
+
+def : Vector_Build <v4f32, R600_Reg128, f32, R600_Reg32>;
+def : Vector_Build <v4i32, R600_Reg128, i32, R600_Reg32>;
@@ -17125,10 +18184,10 @@ index 0000000..64bab18
+} // End isR600toCayman Predicate
diff --git a/lib/Target/R600/R600Intrinsics.td b/lib/Target/R600/R600Intrinsics.td
new file mode 100644
-index 0000000..3825bc4
+index 0000000..6046f0d
--- /dev/null
+++ b/lib/Target/R600/R600Intrinsics.td
-@@ -0,0 +1,32 @@
+@@ -0,0 +1,57 @@
+//===-- R600Intrinsics.td - R600 Instrinsic defs -------*- tablegen -*-----===//
+//
+// The LLVM Compiler Infrastructure
@@ -17143,30 +18202,283 @@ index 0000000..3825bc4
+//===----------------------------------------------------------------------===//
+
+let TargetPrefix = "R600", isTarget = 1 in {
-+ def int_R600_load_input : Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem]>;
-+ def int_R600_load_input_perspective :
-+ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>;
-+ def int_R600_load_input_constant :
-+ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>;
-+ def int_R600_load_input_linear :
-+ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>;
++ def int_R600_load_input :
++ Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem]>;
++ def int_R600_interp_input :
++ Intrinsic<[llvm_float_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem]>;
++ def int_R600_load_texbuf :
++ Intrinsic<[llvm_v4f32_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem]>;
++ def int_R600_store_swizzle :
++ Intrinsic<[], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty], []>;
++
+ def int_R600_store_stream_output :
-+ Intrinsic<[], [llvm_float_ty, llvm_i32_ty, llvm_i32_ty], []>;
++ Intrinsic<[], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], []>;
+ def int_R600_store_pixel_color :
+ Intrinsic<[], [llvm_float_ty, llvm_i32_ty], []>;
+ def int_R600_store_pixel_depth :
+ Intrinsic<[], [llvm_float_ty], []>;
+ def int_R600_store_pixel_stencil :
+ Intrinsic<[], [llvm_float_ty], []>;
-+ def int_R600_store_pixel_dummy :
-+ Intrinsic<[], [], []>;
++ def int_R600_store_dummy :
++ Intrinsic<[], [llvm_i32_ty], []>;
++}
++let TargetPrefix = "r600", isTarget = 1 in {
++
++class R600ReadPreloadRegisterIntrinsic<string name>
++ : Intrinsic<[llvm_i32_ty], [], [IntrNoMem]>,
++ GCCBuiltin<name>;
++
++multiclass R600ReadPreloadRegisterIntrinsic_xyz<string prefix> {
++ def _x : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_x")>;
++ def _y : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_y")>;
++ def _z : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_z")>;
++}
++
++defm int_r600_read_global_size : R600ReadPreloadRegisterIntrinsic_xyz <
++ "__builtin_r600_read_global_size">;
++defm int_r600_read_local_size : R600ReadPreloadRegisterIntrinsic_xyz <
++ "__builtin_r600_read_local_size">;
++defm int_r600_read_ngroups : R600ReadPreloadRegisterIntrinsic_xyz <
++ "__builtin_r600_read_ngroups">;
++defm int_r600_read_tgid : R600ReadPreloadRegisterIntrinsic_xyz <
++ "__builtin_r600_read_tgid">;
++defm int_r600_read_tidig : R600ReadPreloadRegisterIntrinsic_xyz <
++ "__builtin_r600_read_tidig">;
++}
+diff --git a/lib/Target/R600/R600LowerConstCopy.cpp b/lib/Target/R600/R600LowerConstCopy.cpp
+new file mode 100644
+index 0000000..c8c27a8
+--- /dev/null
++++ b/lib/Target/R600/R600LowerConstCopy.cpp
+@@ -0,0 +1,222 @@
++//===-- R600LowerConstCopy.cpp - Propagate ConstCopy / lower them to MOV---===//
++//
++// The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++//
++/// \file
++/// This pass is intended to handle remaining ConstCopy pseudo MachineInstr.
++/// ISel will fold each Const Buffer read inside scalar ALU. However it cannot
++/// fold them inside vector instruction, like DOT4 or Cube ; ISel emits
++/// ConstCopy instead. This pass (executed after ExpandingSpecialInstr) will try
++/// to fold them if possible or replace them by MOV otherwise.
++//
++//===----------------------------------------------------------------------===//
++
++#include "AMDGPU.h"
++#include "llvm/CodeGen/MachineFunction.h"
++#include "llvm/CodeGen/MachineFunctionPass.h"
++#include "R600InstrInfo.h"
++#include "llvm/GlobalValue.h"
++#include "llvm/CodeGen/MachineInstrBuilder.h"
++
++namespace llvm {
++
++class R600LowerConstCopy : public MachineFunctionPass {
++private:
++ static char ID;
++ const R600InstrInfo *TII;
++
++ struct ConstPairs {
++ unsigned XYPair;
++ unsigned ZWPair;
++ };
++
++ bool canFoldInBundle(ConstPairs &UsedConst, unsigned ReadConst) const;
++public:
++ R600LowerConstCopy(TargetMachine &tm);
++ virtual bool runOnMachineFunction(MachineFunction &MF);
++
++ const char *getPassName() const { return "R600 Eliminate Symbolic Operand"; }
++};
++
++char R600LowerConstCopy::ID = 0;
++
++R600LowerConstCopy::R600LowerConstCopy(TargetMachine &tm) :
++ MachineFunctionPass(ID),
++ TII (static_cast<const R600InstrInfo *>(tm.getInstrInfo()))
++{
++}
++
++bool R600LowerConstCopy::canFoldInBundle(ConstPairs &UsedConst,
++ unsigned ReadConst) const {
++ unsigned ReadConstChan = ReadConst & 3;
++ unsigned ReadConstIndex = ReadConst & (~3);
++ if (ReadConstChan < 2) {
++ if (!UsedConst.XYPair) {
++ UsedConst.XYPair = ReadConstIndex;
++ }
++ return UsedConst.XYPair == ReadConstIndex;
++ } else {
++ if (!UsedConst.ZWPair) {
++ UsedConst.ZWPair = ReadConstIndex;
++ }
++ return UsedConst.ZWPair == ReadConstIndex;
++ }
++}
++
++static bool isControlFlow(const MachineInstr &MI) {
++ return (MI.getOpcode() == AMDGPU::IF_PREDICATE_SET) ||
++ (MI.getOpcode() == AMDGPU::ENDIF) ||
++ (MI.getOpcode() == AMDGPU::ELSE) ||
++ (MI.getOpcode() == AMDGPU::WHILELOOP) ||
++ (MI.getOpcode() == AMDGPU::BREAK);
++}
++
++bool R600LowerConstCopy::runOnMachineFunction(MachineFunction &MF) {
++
++ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
++ BB != BB_E; ++BB) {
++ MachineBasicBlock &MBB = *BB;
++ DenseMap<unsigned, MachineInstr *> RegToConstIndex;
++ for (MachineBasicBlock::instr_iterator I = MBB.instr_begin(),
++ E = MBB.instr_end(); I != E;) {
++
++ if (I->getOpcode() == AMDGPU::CONST_COPY) {
++ MachineInstr &MI = *I;
++ I = llvm::next(I);
++ unsigned DstReg = MI.getOperand(0).getReg();
++ DenseMap<unsigned, MachineInstr *>::iterator SrcMI =
++ RegToConstIndex.find(DstReg);
++ if (SrcMI != RegToConstIndex.end()) {
++ SrcMI->second->eraseFromParent();
++ RegToConstIndex.erase(SrcMI);
++ }
++ MachineInstr *NewMI =
++ TII->buildDefaultInstruction(MBB, &MI, AMDGPU::MOV,
++ MI.getOperand(0).getReg(), AMDGPU::ALU_CONST);
++ TII->setImmOperand(NewMI, R600Operands::SRC0_SEL,
++ MI.getOperand(1).getImm());
++ RegToConstIndex[DstReg] = NewMI;
++ MI.eraseFromParent();
++ continue;
++ }
++
++ std::vector<unsigned> Defs;
++ // We consider all Instructions as bundled because algorithm that handle
++ // const read port limitations inside an IG is still valid with single
++ // instructions.
++ std::vector<MachineInstr *> Bundle;
++
++ if (I->isBundle()) {
++ unsigned BundleSize = I->getBundleSize();
++ for (unsigned i = 0; i < BundleSize; i++) {
++ I = llvm::next(I);
++ Bundle.push_back(I);
++ }
++ } else if (TII->isALUInstr(I->getOpcode())){
++ Bundle.push_back(I);
++ } else if (isControlFlow(*I)) {
++ RegToConstIndex.clear();
++ I = llvm::next(I);
++ continue;
++ } else {
++ MachineInstr &MI = *I;
++ for (MachineInstr::mop_iterator MOp = MI.operands_begin(),
++ MOpE = MI.operands_end(); MOp != MOpE; ++MOp) {
++ MachineOperand &MO = *MOp;
++ if (!MO.isReg())
++ continue;
++ if (MO.isDef()) {
++ Defs.push_back(MO.getReg());
++ } else {
++ // Either a TEX or an Export inst, prevent from erasing def of used
++ // operand
++ RegToConstIndex.erase(MO.getReg());
++ for (MCSubRegIterator SR(MO.getReg(), &TII->getRegisterInfo());
++ SR.isValid(); ++SR) {
++ RegToConstIndex.erase(*SR);
++ }
++ }
++ }
++ }
++
++
++ R600Operands::Ops OpTable[3][2] = {
++ {R600Operands::SRC0, R600Operands::SRC0_SEL},
++ {R600Operands::SRC1, R600Operands::SRC1_SEL},
++ {R600Operands::SRC2, R600Operands::SRC2_SEL},
++ };
++
++ for(std::vector<MachineInstr *>::iterator It = Bundle.begin(),
++ ItE = Bundle.end(); It != ItE; ++It) {
++ MachineInstr *MI = *It;
++ if (TII->isPredicated(MI)) {
++ // We don't want to erase previous assignment
++ RegToConstIndex.erase(MI->getOperand(0).getReg());
++ } else {
++ int WriteIDX = TII->getOperandIdx(MI->getOpcode(), R600Operands::WRITE);
++ if (WriteIDX < 0 || MI->getOperand(WriteIDX).getImm())
++ Defs.push_back(MI->getOperand(0).getReg());
++ }
++ }
++
++ ConstPairs CP = {0,0};
++ for (unsigned SrcOp = 0; SrcOp < 3; SrcOp++) {
++ for(std::vector<MachineInstr *>::iterator It = Bundle.begin(),
++ ItE = Bundle.end(); It != ItE; ++It) {
++ MachineInstr *MI = *It;
++ int SrcIdx = TII->getOperandIdx(MI->getOpcode(), OpTable[SrcOp][0]);
++ if (SrcIdx < 0)
++ continue;
++ MachineOperand &MO = MI->getOperand(SrcIdx);
++ DenseMap<unsigned, MachineInstr *>::iterator SrcMI =
++ RegToConstIndex.find(MO.getReg());
++ if (SrcMI != RegToConstIndex.end()) {
++ MachineInstr *CstMov = SrcMI->second;
++ int ConstMovSel =
++ TII->getOperandIdx(CstMov->getOpcode(), R600Operands::SRC0_SEL);
++ unsigned ConstIndex = CstMov->getOperand(ConstMovSel).getImm();
++ if (MI->isInsideBundle() && canFoldInBundle(CP, ConstIndex)) {
++ TII->setImmOperand(MI, OpTable[SrcOp][1], ConstIndex);
++ MI->getOperand(SrcIdx).setReg(AMDGPU::ALU_CONST);
++ } else {
++ RegToConstIndex.erase(SrcMI);
++ }
++ }
++ }
++ }
++
++ for (std::vector<unsigned>::iterator It = Defs.begin(), ItE = Defs.end();
++ It != ItE; ++It) {
++ DenseMap<unsigned, MachineInstr *>::iterator SrcMI =
++ RegToConstIndex.find(*It);
++ if (SrcMI != RegToConstIndex.end()) {
++ SrcMI->second->eraseFromParent();
++ RegToConstIndex.erase(SrcMI);
++ }
++ }
++ I = llvm::next(I);
++ }
++
++ if (MBB.succ_empty()) {
++ for (DenseMap<unsigned, MachineInstr *>::iterator
++ DI = RegToConstIndex.begin(), DE = RegToConstIndex.end();
++ DI != DE; ++DI) {
++ DI->second->eraseFromParent();
++ }
++ }
++ }
++ return false;
++}
++
++FunctionPass *createR600LowerConstCopy(TargetMachine &tm) {
++ return new R600LowerConstCopy(tm);
++}
++
+}
++
++
diff --git a/lib/Target/R600/R600MachineFunctionInfo.cpp b/lib/Target/R600/R600MachineFunctionInfo.cpp
new file mode 100644
-index 0000000..4eb5efa
+index 0000000..40aec83
--- /dev/null
+++ b/lib/Target/R600/R600MachineFunctionInfo.cpp
-@@ -0,0 +1,34 @@
+@@ -0,0 +1,18 @@
+//===-- R600MachineFunctionInfo.cpp - R600 Machine Function Info-*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -17182,31 +18494,15 @@ index 0000000..4eb5efa
+using namespace llvm;
+
+R600MachineFunctionInfo::R600MachineFunctionInfo(const MachineFunction &MF)
-+ : MachineFunctionInfo(),
-+ HasLinearInterpolation(false),
-+ HasPerspectiveInterpolation(false) {
++ : MachineFunctionInfo() {
+ memset(Outputs, 0, sizeof(Outputs));
-+ memset(StreamOutputs, 0, sizeof(StreamOutputs));
+ }
-+
-+unsigned R600MachineFunctionInfo::GetIJPerspectiveIndex() const {
-+ assert(HasPerspectiveInterpolation);
-+ return 0;
-+}
-+
-+unsigned R600MachineFunctionInfo::GetIJLinearIndex() const {
-+ assert(HasLinearInterpolation);
-+ if (HasPerspectiveInterpolation)
-+ return 1;
-+ else
-+ return 0;
-+}
diff --git a/lib/Target/R600/R600MachineFunctionInfo.h b/lib/Target/R600/R600MachineFunctionInfo.h
new file mode 100644
-index 0000000..e97fb5b
+index 0000000..41e4894
--- /dev/null
+++ b/lib/Target/R600/R600MachineFunctionInfo.h
-@@ -0,0 +1,39 @@
+@@ -0,0 +1,33 @@
+//===-- R600MachineFunctionInfo.h - R600 Machine Function Info ----*- C++ -*-=//
+//
+// The LLVM Compiler Infrastructure
@@ -17222,6 +18518,7 @@ index 0000000..e97fb5b
+#ifndef R600MACHINEFUNCTIONINFO_H
+#define R600MACHINEFUNCTIONINFO_H
+
++#include "llvm/ADT/BitVector.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/SelectionDAG.h"
+#include <vector>
@@ -17232,15 +18529,8 @@ index 0000000..e97fb5b
+
+public:
+ R600MachineFunctionInfo(const MachineFunction &MF);
-+ std::vector<unsigned> ReservedRegs;
++ std::vector<unsigned> IndirectRegs;
+ SDNode *Outputs[16];
-+ SDNode *StreamOutputs[64][4];
-+ bool HasLinearInterpolation;
-+ bool HasPerspectiveInterpolation;
-+
-+ unsigned GetIJLinearIndex() const;
-+ unsigned GetIJPerspectiveIndex() const;
-+
+};
+
+} // End llvm namespace
@@ -17248,10 +18538,10 @@ index 0000000..e97fb5b
+#endif //R600MACHINEFUNCTIONINFO_H
diff --git a/lib/Target/R600/R600RegisterInfo.cpp b/lib/Target/R600/R600RegisterInfo.cpp
new file mode 100644
-index 0000000..a39f83d
+index 0000000..bbd7995
--- /dev/null
+++ b/lib/Target/R600/R600RegisterInfo.cpp
-@@ -0,0 +1,89 @@
+@@ -0,0 +1,99 @@
+//===-- R600RegisterInfo.cpp - R600 Register Information ------------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -17269,6 +18559,7 @@ index 0000000..a39f83d
+#include "R600RegisterInfo.h"
+#include "AMDGPUTargetMachine.h"
+#include "R600Defines.h"
++#include "R600InstrInfo.h"
+#include "R600MachineFunctionInfo.h"
+
+using namespace llvm;
@@ -17282,7 +18573,6 @@ index 0000000..a39f83d
+
+BitVector R600RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
+ BitVector Reserved(getNumRegs());
-+ const R600MachineFunctionInfo * MFI = MF.getInfo<R600MachineFunctionInfo>();
+
+ Reserved.set(AMDGPU::ZERO);
+ Reserved.set(AMDGPU::HALF);
@@ -17292,21 +18582,30 @@ index 0000000..a39f83d
+ Reserved.set(AMDGPU::NEG_ONE);
+ Reserved.set(AMDGPU::PV_X);
+ Reserved.set(AMDGPU::ALU_LITERAL_X);
++ Reserved.set(AMDGPU::ALU_CONST);
+ Reserved.set(AMDGPU::PREDICATE_BIT);
+ Reserved.set(AMDGPU::PRED_SEL_OFF);
+ Reserved.set(AMDGPU::PRED_SEL_ZERO);
+ Reserved.set(AMDGPU::PRED_SEL_ONE);
+
-+ for (TargetRegisterClass::iterator I = AMDGPU::R600_CReg32RegClass.begin(),
-+ E = AMDGPU::R600_CReg32RegClass.end(); I != E; ++I) {
++ for (TargetRegisterClass::iterator I = AMDGPU::R600_AddrRegClass.begin(),
++ E = AMDGPU::R600_AddrRegClass.end(); I != E; ++I) {
+ Reserved.set(*I);
+ }
+
-+ for (std::vector<unsigned>::const_iterator I = MFI->ReservedRegs.begin(),
-+ E = MFI->ReservedRegs.end(); I != E; ++I) {
++ for (TargetRegisterClass::iterator I = AMDGPU::TRegMemRegClass.begin(),
++ E = AMDGPU::TRegMemRegClass.end();
++ I != E; ++I) {
+ Reserved.set(*I);
+ }
+
++ const R600InstrInfo *RII = static_cast<const R600InstrInfo*>(&TII);
++ std::vector<unsigned> IndirectRegs = RII->getIndirectReservedRegs(MF);
++ for (std::vector<unsigned>::iterator I = IndirectRegs.begin(),
++ E = IndirectRegs.end();
++ I != E; ++I) {
++ Reserved.set(*I);
++ }
+ return Reserved;
+}
+
@@ -17335,12 +18634,13 @@ index 0000000..a39f83d
+unsigned R600RegisterInfo::getSubRegFromChannel(unsigned Channel) const {
+ switch (Channel) {
+ default: assert(!"Invalid channel index"); return 0;
-+ case 0: return AMDGPU::sel_x;
-+ case 1: return AMDGPU::sel_y;
-+ case 2: return AMDGPU::sel_z;
-+ case 3: return AMDGPU::sel_w;
++ case 0: return AMDGPU::sub0;
++ case 1: return AMDGPU::sub1;
++ case 2: return AMDGPU::sub2;
++ case 3: return AMDGPU::sub3;
+ }
+}
++
diff --git a/lib/Target/R600/R600RegisterInfo.h b/lib/Target/R600/R600RegisterInfo.h
new file mode 100644
index 0000000..c170ccb
@@ -17404,10 +18704,10 @@ index 0000000..c170ccb
+#endif // AMDIDSAREGISTERINFO_H_
diff --git a/lib/Target/R600/R600RegisterInfo.td b/lib/Target/R600/R600RegisterInfo.td
new file mode 100644
-index 0000000..d3d6d25
+index 0000000..a7d847a
--- /dev/null
+++ b/lib/Target/R600/R600RegisterInfo.td
-@@ -0,0 +1,107 @@
+@@ -0,0 +1,146 @@
+
+class R600Reg <string name, bits<16> encoding> : Register<name> {
+ let Namespace = "AMDGPU";
@@ -17429,7 +18729,7 @@ index 0000000..d3d6d25
+class R600Reg_128<string n, list<Register> subregs, bits<16> encoding> :
+ RegisterWithSubRegs<n, subregs> {
+ let Namespace = "AMDGPU";
-+ let SubRegIndices = [sel_x, sel_y, sel_z, sel_w];
++ let SubRegIndices = [sub0, sub1, sub2, sub3];
+ let HWEncoding = encoding;
+}
+
@@ -17438,9 +18738,11 @@ index 0000000..d3d6d25
+ // 32-bit Temporary Registers
+ def T#Index#_#Chan : R600RegWithChan <"T"#Index#"."#Chan, Index, Chan>;
+
-+ // 32-bit Constant Registers (There are more than 128, this the number
-+ // that is currently supported.
-+ def C#Index#_#Chan : R600RegWithChan <"C"#Index#"."#Chan, Index, Chan>;
++ // Indirect addressing offset registers
++ def Addr#Index#_#Chan : R600RegWithChan <"T("#Index#" + AR.x)."#Chan,
++ Index, Chan>;
++ def TRegMem#Index#_#Chan : R600RegWithChan <"T"#Index#"."#Chan, Index,
++ Chan>;
+ }
+ // 128-bit Temporary Registers
+ def T#Index#_XYZW : R600Reg_128 <"T"#Index#".XYZW",
@@ -17471,19 +18773,25 @@ index 0000000..d3d6d25
+def PRED_SEL_OFF: R600Reg<"Pred_sel_off", 0>;
+def PRED_SEL_ZERO : R600Reg<"Pred_sel_zero", 2>;
+def PRED_SEL_ONE : R600Reg<"Pred_sel_one", 3>;
++def AR_X : R600Reg<"AR.x", 0>;
+
+def R600_ArrayBase : RegisterClass <"AMDGPU", [f32, i32], 32,
+ (add (sequence "ArrayBase%u", 448, 464))>;
++// special registers for ALU src operands
++// const buffer reference, SRCx_SEL contains index
++def ALU_CONST : R600Reg<"CBuf", 0>;
++// interpolation param reference, SRCx_SEL contains index
++def ALU_PARAM : R600Reg<"Param", 0>;
++
++let isAllocatable = 0 in {
++
++// XXX: Only use the X channel, until we support wider stack widths
++def R600_Addr : RegisterClass <"AMDGPU", [i32], 127, (add (sequence "Addr%u_X", 0, 127))>;
+
-+def R600_CReg32 : RegisterClass <"AMDGPU", [f32, i32], 32,
-+ (add (interleave
-+ (interleave (sequence "C%u_X", 0, 127),
-+ (sequence "C%u_Z", 0, 127)),
-+ (interleave (sequence "C%u_Y", 0, 127),
-+ (sequence "C%u_W", 0, 127))))>;
++} // End isAllocatable = 0
+
+def R600_TReg32_X : RegisterClass <"AMDGPU", [f32, i32], 32,
-+ (add (sequence "T%u_X", 0, 127))>;
++ (add (sequence "T%u_X", 0, 127), AR_X)>;
+
+def R600_TReg32_Y : RegisterClass <"AMDGPU", [f32, i32], 32,
+ (add (sequence "T%u_Y", 0, 127))>;
@@ -17495,15 +18803,16 @@ index 0000000..d3d6d25
+ (add (sequence "T%u_W", 0, 127))>;
+
+def R600_TReg32 : RegisterClass <"AMDGPU", [f32, i32], 32,
-+ (add (interleave
-+ (interleave R600_TReg32_X, R600_TReg32_Z),
-+ (interleave R600_TReg32_Y, R600_TReg32_W)))>;
++ (interleave R600_TReg32_X, R600_TReg32_Y,
++ R600_TReg32_Z, R600_TReg32_W)>;
+
+def R600_Reg32 : RegisterClass <"AMDGPU", [f32, i32], 32, (add
+ R600_TReg32,
-+ R600_CReg32,
+ R600_ArrayBase,
-+ ZERO, HALF, ONE, ONE_INT, PV_X, ALU_LITERAL_X, NEG_ONE, NEG_HALF)>;
++ R600_Addr,
++ ZERO, HALF, ONE, ONE_INT, PV_X, ALU_LITERAL_X, NEG_ONE, NEG_HALF,
++ ALU_CONST, ALU_PARAM
++ )>;
+
+def R600_Predicate : RegisterClass <"AMDGPU", [i32], 32, (add
+ PRED_SEL_OFF, PRED_SEL_ZERO, PRED_SEL_ONE)>;
@@ -17515,6 +18824,36 @@ index 0000000..d3d6d25
+ (add (sequence "T%u_XYZW", 0, 127))> {
+ let CopyCost = -1;
+}
++
++//===----------------------------------------------------------------------===//
++// Register classes for indirect addressing
++//===----------------------------------------------------------------------===//
++
++// Super register for all the Indirect Registers. This register class is used
++// by the REG_SEQUENCE instruction to specify the registers to use for direct
++// reads / writes which may be written / read by an indirect address.
++class IndirectSuper<string n, list<Register> subregs> :
++ RegisterWithSubRegs<n, subregs> {
++ let Namespace = "AMDGPU";
++ let SubRegIndices =
++ [sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7,
++ sub8, sub9, sub10, sub11, sub12, sub13, sub14, sub15];
++}
++
++def IndirectSuperReg : IndirectSuper<"Indirect",
++ [TRegMem0_X, TRegMem1_X, TRegMem2_X, TRegMem3_X, TRegMem4_X, TRegMem5_X,
++ TRegMem6_X, TRegMem7_X, TRegMem8_X, TRegMem9_X, TRegMem10_X, TRegMem11_X,
++ TRegMem12_X, TRegMem13_X, TRegMem14_X, TRegMem15_X]
++>;
++
++def IndirectReg : RegisterClass<"AMDGPU", [f32, i32], 32, (add IndirectSuperReg)>;
++
++// This register class defines the registers that are the storage units for
++// the "Indirect Addressing" pseudo memory space.
++// XXX: Only use the X channel, until we support wider stack widths
++def TRegMem : RegisterClass<"AMDGPU", [f32, i32], 32,
++ (add (sequence "TRegMem%u_X", 0, 16))
++>;
diff --git a/lib/Target/R600/R600Schedule.td b/lib/Target/R600/R600Schedule.td
new file mode 100644
index 0000000..7ede181
@@ -18053,10 +19392,10 @@ index 0000000..832e44d
+}
diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
new file mode 100644
-index 0000000..cd6e0e9
+index 0000000..694c045
--- /dev/null
+++ b/lib/Target/R600/SIISelLowering.cpp
-@@ -0,0 +1,512 @@
+@@ -0,0 +1,399 @@
+//===-- SIISelLowering.cpp - SI DAG Lowering Implementation ---------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -18090,16 +19429,16 @@ index 0000000..cd6e0e9
+ addRegisterClass(MVT::f32, &AMDGPU::VReg_32RegClass);
+ addRegisterClass(MVT::i32, &AMDGPU::VReg_32RegClass);
+ addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass);
-+ addRegisterClass(MVT::i1, &AMDGPU::SCCRegRegClass);
-+ addRegisterClass(MVT::i1, &AMDGPU::VCCRegRegClass);
++ addRegisterClass(MVT::i1, &AMDGPU::SReg_64RegClass);
+
-+ addRegisterClass(MVT::v4i32, &AMDGPU::SReg_128RegClass);
-+ addRegisterClass(MVT::v8i32, &AMDGPU::SReg_256RegClass);
++ addRegisterClass(MVT::v1i32, &AMDGPU::VReg_32RegClass);
++ addRegisterClass(MVT::v2i32, &AMDGPU::VReg_64RegClass);
++ addRegisterClass(MVT::v4i32, &AMDGPU::VReg_128RegClass);
++ addRegisterClass(MVT::v8i32, &AMDGPU::VReg_256RegClass);
++ addRegisterClass(MVT::v16i32, &AMDGPU::VReg_512RegClass);
+
+ computeRegisterProperties();
+
-+ setOperationAction(ISD::AND, MVT::i1, Custom);
-+
+ setOperationAction(ISD::ADD, MVT::i64, Legal);
+ setOperationAction(ISD::ADD, MVT::i32, Legal);
+
@@ -18125,23 +19464,16 @@ index 0000000..cd6e0e9
+ MachineRegisterInfo & MRI = BB->getParent()->getRegInfo();
+ MachineBasicBlock::iterator I = MI;
+
-+ if (TII->get(MI->getOpcode()).TSFlags & SIInstrFlags::NEED_WAIT) {
-+ AppendS_WAITCNT(MI, *BB, llvm::next(I));
-+ return BB;
-+ }
-+
+ switch (MI->getOpcode()) {
+ default:
+ return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB);
+ case AMDGPU::BRANCH: return BB;
+ case AMDGPU::CLAMP_SI:
-+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64))
++ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
+ .addOperand(MI->getOperand(0))
+ .addOperand(MI->getOperand(1))
-+ // VSRC1-2 are unused, but we still need to fill all the
-+ // operand slots, so we just reuse the VSRC0 operand
-+ .addOperand(MI->getOperand(1))
-+ .addOperand(MI->getOperand(1))
++ .addImm(0x80) // SRC1
++ .addImm(0x80) // SRC2
+ .addImm(0) // ABS
+ .addImm(1) // CLAMP
+ .addImm(0) // OMOD
@@ -18150,13 +19482,11 @@ index 0000000..cd6e0e9
+ break;
+
+ case AMDGPU::FABS_SI:
-+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64))
++ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
+ .addOperand(MI->getOperand(0))
+ .addOperand(MI->getOperand(1))
-+ // VSRC1-2 are unused, but we still need to fill all the
-+ // operand slots, so we just reuse the VSRC0 operand
-+ .addOperand(MI->getOperand(1))
-+ .addOperand(MI->getOperand(1))
++ .addImm(0x80) // SRC1
++ .addImm(0x80) // SRC2
+ .addImm(1) // ABS
+ .addImm(0) // CLAMP
+ .addImm(0) // OMOD
@@ -18165,13 +19495,11 @@ index 0000000..cd6e0e9
+ break;
+
+ case AMDGPU::FNEG_SI:
-+ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64))
++ BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
+ .addOperand(MI->getOperand(0))
+ .addOperand(MI->getOperand(1))
-+ // VSRC1-2 are unused, but we still need to fill all the
-+ // operand slots, so we just reuse the VSRC0 operand
-+ .addOperand(MI->getOperand(1))
-+ .addOperand(MI->getOperand(1))
++ .addImm(0x80) // SRC1
++ .addImm(0x80) // SRC2
+ .addImm(0) // ABS
+ .addImm(0) // CLAMP
+ .addImm(0) // OMOD
@@ -18187,29 +19515,13 @@ index 0000000..cd6e0e9
+ case AMDGPU::SI_INTERP:
+ LowerSI_INTERP(MI, *BB, I, MRI);
+ break;
-+ case AMDGPU::SI_INTERP_CONST:
-+ LowerSI_INTERP_CONST(MI, *BB, I, MRI);
-+ break;
-+ case AMDGPU::SI_KIL:
-+ LowerSI_KIL(MI, *BB, I, MRI);
-+ break;
+ case AMDGPU::SI_WQM:
+ LowerSI_WQM(MI, *BB, I, MRI);
+ break;
-+ case AMDGPU::SI_V_CNDLT:
-+ LowerSI_V_CNDLT(MI, *BB, I, MRI);
-+ break;
+ }
+ return BB;
+}
+
-+void SITargetLowering::AppendS_WAITCNT(MachineInstr *MI, MachineBasicBlock &BB,
-+ MachineBasicBlock::iterator I) const {
-+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_WAITCNT))
-+ .addImm(0);
-+}
-+
-+
+void SITargetLowering::LowerSI_WQM(MachineInstr *MI, MachineBasicBlock &BB,
+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const {
+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_WQM_B64), AMDGPU::EXEC)
@@ -18249,57 +19561,6 @@ index 0000000..cd6e0e9
+ MI->eraseFromParent();
+}
+
-+void SITargetLowering::LowerSI_INTERP_CONST(MachineInstr *MI,
-+ MachineBasicBlock &BB, MachineBasicBlock::iterator I,
-+ MachineRegisterInfo &MRI) const {
-+ MachineOperand dst = MI->getOperand(0);
-+ MachineOperand attr_chan = MI->getOperand(1);
-+ MachineOperand attr = MI->getOperand(2);
-+ MachineOperand params = MI->getOperand(3);
-+ unsigned M0 = MRI.createVirtualRegister(&AMDGPU::M0RegRegClass);
-+
-+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_MOV_B32), M0)
-+ .addOperand(params);
-+
-+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_INTERP_MOV_F32))
-+ .addOperand(dst)
-+ .addOperand(attr_chan)
-+ .addOperand(attr)
-+ .addReg(M0);
-+
-+ MI->eraseFromParent();
-+}
-+
-+void SITargetLowering::LowerSI_KIL(MachineInstr *MI, MachineBasicBlock &BB,
-+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const {
-+ // Clear this pixel from the exec mask if the operand is negative
-+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_CMPX_LE_F32_e32),
-+ AMDGPU::VCC)
-+ .addReg(AMDGPU::SREG_LIT_0)
-+ .addOperand(MI->getOperand(0));
-+
-+ MI->eraseFromParent();
-+}
-+
-+void SITargetLowering::LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB,
-+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const {
-+ unsigned VCC = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass);
-+
-+ BuildMI(BB, I, BB.findDebugLoc(I),
-+ TII->get(AMDGPU::V_CMP_GT_F32_e32),
-+ VCC)
-+ .addReg(AMDGPU::SREG_LIT_0)
-+ .addOperand(MI->getOperand(1));
-+
-+ BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_CNDMASK_B32_e32))
-+ .addOperand(MI->getOperand(0))
-+ .addOperand(MI->getOperand(3))
-+ .addOperand(MI->getOperand(2))
-+ .addReg(VCC);
-+
-+ MI->eraseFromParent();
-+}
-+
+EVT SITargetLowering::getSetCCResultType(EVT VT) const {
+ return MVT::i1;
+}
@@ -18314,7 +19575,6 @@ index 0000000..cd6e0e9
+ case ISD::BRCOND: return LowerBRCOND(Op, DAG);
+ case ISD::LOAD: return LowerLOAD(Op, DAG);
+ case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG);
-+ case ISD::AND: return Loweri1ContextSwitch(Op, DAG, ISD::AND);
+ case ISD::INTRINSIC_WO_CHAIN: {
+ unsigned IntrinsicID =
+ cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
@@ -18331,30 +19591,6 @@ index 0000000..cd6e0e9
+ return SDValue();
+}
+
-+/// \brief The function is for lowering i1 operations on the
-+/// VCC register.
-+///
-+/// In the VALU context, VCC is a one bit register, but in the
-+/// SALU context the VCC is a 64-bit register (1-bit per thread). Since only
-+/// the SALU can perform operations on the VCC register, we need to promote
-+/// the operand types from i1 to i64 in order for tablegen to be able to match
-+/// this operation to the correct SALU instruction. We do this promotion by
-+/// wrapping the operands in a CopyToReg node.
-+///
-+SDValue SITargetLowering::Loweri1ContextSwitch(SDValue Op,
-+ SelectionDAG &DAG,
-+ unsigned VCCNode) const {
-+ DebugLoc DL = Op.getDebugLoc();
-+
-+ SDValue OpNode = DAG.getNode(VCCNode, DL, MVT::i64,
-+ DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i64,
-+ Op.getOperand(0)),
-+ DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i64,
-+ Op.getOperand(1)));
-+
-+ return DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i1, OpNode);
-+}
-+
+/// \brief Helper function for LowerBRCOND
+static SDNode *findUser(SDValue Value, unsigned Opcode) {
+
@@ -18559,22 +19795,12 @@ index 0000000..cd6e0e9
+ }
+ return SDValue();
+}
-+
-+#define NODE_NAME_CASE(node) case SIISD::node: return #node;
-+
-+const char* SITargetLowering::getTargetNodeName(unsigned Opcode) const {
-+ switch (Opcode) {
-+ default: return AMDGPUTargetLowering::getTargetNodeName(Opcode);
-+ NODE_NAME_CASE(VCC_AND)
-+ NODE_NAME_CASE(VCC_BITCAST)
-+ }
-+}
diff --git a/lib/Target/R600/SIISelLowering.h b/lib/Target/R600/SIISelLowering.h
new file mode 100644
-index 0000000..c088112
+index 0000000..5d048f8
--- /dev/null
+++ b/lib/Target/R600/SIISelLowering.h
-@@ -0,0 +1,62 @@
+@@ -0,0 +1,48 @@
+//===-- SIISelLowering.h - SI DAG Lowering Interface ------------*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -18600,26 +19826,13 @@ index 0000000..c088112
+class SITargetLowering : public AMDGPUTargetLowering {
+ const SIInstrInfo * TII;
+
-+ /// Memory reads and writes are syncronized using the S_WAITCNT instruction.
-+ /// This function takes the most conservative approach and inserts an
-+ /// S_WAITCNT instruction after every read and write.
-+ void AppendS_WAITCNT(MachineInstr *MI, MachineBasicBlock &BB,
-+ MachineBasicBlock::iterator I) const;
+ void LowerMOV_IMM(MachineInstr *MI, MachineBasicBlock &BB,
+ MachineBasicBlock::iterator I, unsigned Opocde) const;
+ void LowerSI_INTERP(MachineInstr *MI, MachineBasicBlock &BB,
+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
-+ void LowerSI_INTERP_CONST(MachineInstr *MI, MachineBasicBlock &BB,
-+ MachineBasicBlock::iterator I, MachineRegisterInfo &MRI) const;
-+ void LowerSI_KIL(MachineInstr *MI, MachineBasicBlock &BB,
-+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
+ void LowerSI_WQM(MachineInstr *MI, MachineBasicBlock &BB,
+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
-+ void LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB,
-+ MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
+
-+ SDValue Loweri1ContextSwitch(SDValue Op, SelectionDAG &DAG,
-+ unsigned VCCNode) const;
+ SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;
@@ -18631,18 +19844,376 @@ index 0000000..c088112
+ virtual EVT getSetCCResultType(EVT VT) const;
+ virtual SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const;
+ virtual SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const;
-+ virtual const char* getTargetNodeName(unsigned Opcode) const;
+};
+
+} // End namespace llvm
+
+#endif //SIISELLOWERING_H
+diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp
+new file mode 100644
+index 0000000..24fc929
+--- /dev/null
++++ b/lib/Target/R600/SIInsertWaits.cpp
+@@ -0,0 +1,353 @@
++//===-- SILowerControlFlow.cpp - Use predicates for control flow ----------===//
++//
++// The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++//
++/// \file
++/// \brief Insert wait instructions for memory reads and writes.
++///
++/// Memory reads and writes are issued asynchronously, so we need to insert
++/// S_WAITCNT instructions when we want to access any of their results or
++/// overwrite any register that's used asynchronously.
++//
++//===----------------------------------------------------------------------===//
++
++#include "AMDGPU.h"
++#include "SIInstrInfo.h"
++#include "SIMachineFunctionInfo.h"
++#include "llvm/CodeGen/MachineFunction.h"
++#include "llvm/CodeGen/MachineFunctionPass.h"
++#include "llvm/CodeGen/MachineInstrBuilder.h"
++#include "llvm/CodeGen/MachineRegisterInfo.h"
++
++using namespace llvm;
++
++namespace {
++
++/// \brief One variable for each of the hardware counters
++typedef union {
++ struct {
++ unsigned VM;
++ unsigned EXP;
++ unsigned LGKM;
++ } Named;
++ unsigned Array[3];
++
++} Counters;
++
++typedef Counters RegCounters[512];
++typedef std::pair<unsigned, unsigned> RegInterval;
++
++class SIInsertWaits : public MachineFunctionPass {
++
++private:
++ static char ID;
++ const SIInstrInfo *TII;
++ const SIRegisterInfo &TRI;
++ const MachineRegisterInfo *MRI;
++
++ /// \brief Constant hardware limits
++ static const Counters WaitCounts;
++
++ /// \brief Constant zero value
++ static const Counters ZeroCounts;
++
++ /// \brief Counter values we have already waited on.
++ Counters WaitedOn;
++
++ /// \brief Counter values for last instruction issued.
++ Counters LastIssued;
++
++ /// \brief Registers used by async instructions.
++ RegCounters UsedRegs;
++
++ /// \brief Registers defined by async instructions.
++ RegCounters DefinedRegs;
++
++ /// \brief Different export instruction types seen since last wait.
++ unsigned ExpInstrTypesSeen;
++
++ /// \brief Get increment/decrement amount for this instruction.
++ Counters getHwCounts(MachineInstr &MI);
++
++ /// \brief Is operand relevant for async execution?
++ bool isOpRelevant(MachineOperand &Op);
++
++ /// \brief Get register interval an operand affects.
++ RegInterval getRegInterval(MachineOperand &Op);
++
++ /// \brief Handle instructions async components
++ void pushInstruction(MachineInstr &MI);
++
++ /// \brief Insert the actual wait instruction
++ bool insertWait(MachineBasicBlock &MBB,
++ MachineBasicBlock::iterator I,
++ const Counters &Counts);
++
++ /// \brief Resolve all operand dependencies to counter requirements
++ Counters handleOperands(MachineInstr &MI);
++
++public:
++ SIInsertWaits(TargetMachine &tm) :
++ MachineFunctionPass(ID),
++ TII(static_cast<const SIInstrInfo*>(tm.getInstrInfo())),
++ TRI(TII->getRegisterInfo()) { }
++
++ virtual bool runOnMachineFunction(MachineFunction &MF);
++
++ const char *getPassName() const {
++ return "SI insert wait instructions";
++ }
++
++};
++
++} // End anonymous namespace
++
++char SIInsertWaits::ID = 0;
++
++const Counters SIInsertWaits::WaitCounts = { { 15, 7, 7 } };
++const Counters SIInsertWaits::ZeroCounts = { { 0, 0, 0 } };
++
++FunctionPass *llvm::createSIInsertWaits(TargetMachine &tm) {
++ return new SIInsertWaits(tm);
++}
++
++Counters SIInsertWaits::getHwCounts(MachineInstr &MI) {
++
++ uint64_t TSFlags = TII->get(MI.getOpcode()).TSFlags;
++ Counters Result;
++
++ Result.Named.VM = !!(TSFlags & SIInstrFlags::VM_CNT);
++
++ // Only consider stores or EXP for EXP_CNT
++ Result.Named.EXP = !!(TSFlags & SIInstrFlags::EXP_CNT &&
++ (MI.getOpcode() == AMDGPU::EXP || !MI.getDesc().mayStore()));
++
++ // LGKM may uses larger values
++ if (TSFlags & SIInstrFlags::LGKM_CNT) {
++
++ MachineOperand &Op = MI.getOperand(0);
++ assert(Op.isReg() && "First LGKM operand must be a register!");
++
++ unsigned Reg = Op.getReg();
++ unsigned Size = TRI.getMinimalPhysRegClass(Reg)->getSize();
++ Result.Named.LGKM = Size > 4 ? 2 : 1;
++
++ } else {
++ Result.Named.LGKM = 0;
++ }
++
++ return Result;
++}
++
++bool SIInsertWaits::isOpRelevant(MachineOperand &Op) {
++
++ // Constants are always irrelevant
++ if (!Op.isReg())
++ return false;
++
++ // Defines are always relevant
++ if (Op.isDef())
++ return true;
++
++ // For exports all registers are relevant
++ MachineInstr &MI = *Op.getParent();
++ if (MI.getOpcode() == AMDGPU::EXP)
++ return true;
++
++ // For stores the stored value is also relevant
++ if (!MI.getDesc().mayStore())
++ return false;
++
++ for (MachineInstr::mop_iterator I = MI.operands_begin(),
++ E = MI.operands_end(); I != E; ++I) {
++
++ if (I->isReg() && I->isUse())
++ return Op.isIdenticalTo(*I);
++ }
++
++ return false;
++}
++
++RegInterval SIInsertWaits::getRegInterval(MachineOperand &Op) {
++
++ if (!Op.isReg())
++ return std::make_pair(0, 0);
++
++ unsigned Reg = Op.getReg();
++ unsigned Size = TRI.getMinimalPhysRegClass(Reg)->getSize();
++
++ assert(Size >= 4);
++
++ RegInterval Result;
++ Result.first = TRI.getEncodingValue(Reg);
++ Result.second = Result.first + Size / 4;
++
++ return Result;
++}
++
++void SIInsertWaits::pushInstruction(MachineInstr &MI) {
++
++ // Get the hardware counter increments and sum them up
++ Counters Increment = getHwCounts(MI);
++ unsigned Sum = 0;
++
++ for (unsigned i = 0; i < 3; ++i) {
++ LastIssued.Array[i] += Increment.Array[i];
++ Sum += Increment.Array[i];
++ }
++
++ // If we don't increase anything then that's it
++ if (Sum == 0)
++ return;
++
++ // Remember which export instructions we have seen
++ if (Increment.Named.EXP) {
++ ExpInstrTypesSeen |= MI.getOpcode() == AMDGPU::EXP ? 1 : 2;
++ }
++
++ for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
++
++ MachineOperand &Op = MI.getOperand(i);
++ if (!isOpRelevant(Op))
++ continue;
++
++ RegInterval Interval = getRegInterval(Op);
++ for (unsigned j = Interval.first; j < Interval.second; ++j) {
++
++ // Remember which registers we define
++ if (Op.isDef())
++ DefinedRegs[j] = LastIssued;
++
++ // and which one we are using
++ if (Op.isUse())
++ UsedRegs[j] = LastIssued;
++ }
++ }
++}
++
++bool SIInsertWaits::insertWait(MachineBasicBlock &MBB,
++ MachineBasicBlock::iterator I,
++ const Counters &Required) {
++
++ // End of program? No need to wait on anything
++ if (I != MBB.end() && I->getOpcode() == AMDGPU::S_ENDPGM)
++ return false;
++
++ // Figure out if the async instructions execute in order
++ bool Ordered[3];
++
++ // VM_CNT is always ordered
++ Ordered[0] = true;
++
++ // EXP_CNT is unordered if we have both EXP & VM-writes
++ Ordered[1] = ExpInstrTypesSeen == 3;
++
++ // LGKM_CNT is handled as always unordered. TODO: Handle LDS and GDS
++ Ordered[2] = false;
++
++ // The values we are going to put into the S_WAITCNT instruction
++ Counters Counts = WaitCounts;
++
++ // Do we really need to wait?
++ bool NeedWait = false;
++
++ for (unsigned i = 0; i < 3; ++i) {
++
++ if (Required.Array[i] <= WaitedOn.Array[i])
++ continue;
++
++ NeedWait = true;
++
++ if (Ordered[i]) {
++ unsigned Value = LastIssued.Array[i] - Required.Array[i];
++
++ // adjust the value to the real hardware posibilities
++ Counts.Array[i] = std::min(Value, WaitCounts.Array[i]);
++
++ } else
++ Counts.Array[i] = 0;
++
++ // Remember on what we have waited on
++ WaitedOn.Array[i] = LastIssued.Array[i] - Counts.Array[i];
++ }
++
++ if (!NeedWait)
++ return false;
++
++ // Reset EXP_CNT instruction types
++ if (Counts.Named.EXP == 0)
++ ExpInstrTypesSeen = 0;
++
++ // Build the wait instruction
++ BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))
++ .addImm((Counts.Named.VM & 0xF) |
++ ((Counts.Named.EXP & 0x7) << 4) |
++ ((Counts.Named.LGKM & 0x7) << 8));
++
++ return true;
++}
++
++/// \brief helper function for handleOperands
++static void increaseCounters(Counters &Dst, const Counters &Src) {
++
++ for (unsigned i = 0; i < 3; ++i)
++ Dst.Array[i] = std::max(Dst.Array[i], Src.Array[i]);
++}
++
++Counters SIInsertWaits::handleOperands(MachineInstr &MI) {
++
++ Counters Result = ZeroCounts;
++
++ // For each register affected by this
++ // instruction increase the result sequence
++ for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
++
++ MachineOperand &Op = MI.getOperand(i);
++ RegInterval Interval = getRegInterval(Op);
++ for (unsigned j = Interval.first; j < Interval.second; ++j) {
++
++ if (Op.isDef())
++ increaseCounters(Result, UsedRegs[j]);
++
++ if (Op.isUse())
++ increaseCounters(Result, DefinedRegs[j]);
++ }
++ }
++
++ return Result;
++}
++
++bool SIInsertWaits::runOnMachineFunction(MachineFunction &MF) {
++
++ bool Changes = false;
++
++ MRI = &MF.getRegInfo();
++
++ WaitedOn = ZeroCounts;
++ LastIssued = ZeroCounts;
++
++ memset(&UsedRegs, 0, sizeof(UsedRegs));
++ memset(&DefinedRegs, 0, sizeof(DefinedRegs));
++
++ for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
++ BI != BE; ++BI) {
++
++ MachineBasicBlock &MBB = *BI;
++ for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
++ I != E; ++I) {
++
++ Changes |= insertWait(MBB, I, handleOperands(*I));
++ pushInstruction(*I);
++ }
++
++ // Wait for everything at the end of the MBB
++ Changes |= insertWait(MBB, MBB.getFirstTerminator(), LastIssued);
++ }
++
++ return Changes;
++}
diff --git a/lib/Target/R600/SIInstrFormats.td b/lib/Target/R600/SIInstrFormats.td
new file mode 100644
-index 0000000..aea3b5a
+index 0000000..40e37aa
--- /dev/null
+++ b/lib/Target/R600/SIInstrFormats.td
-@@ -0,0 +1,146 @@
+@@ -0,0 +1,188 @@
+//===-- SIInstrFormats.td - SI Instruction Formats ------------------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -18666,40 +20237,23 @@ index 0000000..aea3b5a
+//
+//===----------------------------------------------------------------------===//
+
-+class VOP3b_2IN <bits<9> op, string opName, RegisterClass dstClass,
-+ RegisterClass src0Class, RegisterClass src1Class,
-+ list<dag> pattern>
-+ : VOP3b <op, (outs dstClass:$vdst),
-+ (ins src0Class:$src0, src1Class:$src1, InstFlag:$src2, InstFlag:$sdst,
-+ InstFlag:$omod, InstFlag:$neg),
-+ opName, pattern
-+>;
-+
-+
-+class VOP3_1_32 <bits<9> op, string opName, list<dag> pattern>
-+ : VOP3b_2IN <op, opName, SReg_1, AllReg_32, VReg_32, pattern>;
-+
+class VOP3_32 <bits<9> op, string opName, list<dag> pattern>
-+ : VOP3 <op, (outs VReg_32:$dst), (ins AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
++ : VOP3 <op, (outs VReg_32:$dst), (ins VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
+
+class VOP3_64 <bits<9> op, string opName, list<dag> pattern>
-+ : VOP3 <op, (outs VReg_64:$dst), (ins AllReg_64:$src0, VReg_64:$src1, VReg_64:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
-+
++ : VOP3 <op, (outs VReg_64:$dst), (ins VSrc_64:$src0, VReg_64:$src1, VReg_64:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
+
+class SOP1_32 <bits<8> op, string opName, list<dag> pattern>
-+ : SOP1 <op, (outs SReg_32:$dst), (ins SReg_32:$src0), opName, pattern>;
++ : SOP1 <op, (outs SReg_32:$dst), (ins SSrc_32:$src0), opName, pattern>;
+
+class SOP1_64 <bits<8> op, string opName, list<dag> pattern>
-+ : SOP1 <op, (outs SReg_64:$dst), (ins SReg_64:$src0), opName, pattern>;
++ : SOP1 <op, (outs SReg_64:$dst), (ins SSrc_64:$src0), opName, pattern>;
+
+class SOP2_32 <bits<7> op, string opName, list<dag> pattern>
-+ : SOP2 <op, (outs SReg_32:$dst), (ins SReg_32:$src0, SReg_32:$src1), opName, pattern>;
++ : SOP2 <op, (outs SReg_32:$dst), (ins SSrc_32:$src0, SSrc_32:$src1), opName, pattern>;
+
+class SOP2_64 <bits<7> op, string opName, list<dag> pattern>
-+ : SOP2 <op, (outs SReg_64:$dst), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>;
-+
-+class SOP2_VCC <bits<7> op, string opName, list<dag> pattern>
-+ : SOP2 <op, (outs SReg_1:$vcc), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>;
++ : SOP2 <op, (outs SReg_64:$dst), (ins SSrc_64:$src0, SSrc_64:$src1), opName, pattern>;
+
+class VOP1_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc,
+ string opName, list<dag> pattern> :
@@ -18708,7 +20262,7 @@ index 0000000..aea3b5a
+ >;
+
+multiclass VOP1_32 <bits<8> op, string opName, list<dag> pattern> {
-+ def _e32: VOP1_Helper <op, VReg_32, AllReg_32, opName, pattern>;
++ def _e32: VOP1_Helper <op, VReg_32, VSrc_32, opName, pattern>;
+ def _e64 : VOP3_32 <{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
+ opName, []
+ >;
@@ -18716,7 +20270,7 @@ index 0000000..aea3b5a
+
+multiclass VOP1_64 <bits<8> op, string opName, list<dag> pattern> {
+
-+ def _e32 : VOP1_Helper <op, VReg_64, AllReg_64, opName, pattern>;
++ def _e32 : VOP1_Helper <op, VReg_64, VSrc_64, opName, pattern>;
+
+ def _e64 : VOP3_64 <
+ {1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
@@ -18732,7 +20286,7 @@ index 0000000..aea3b5a
+
+multiclass VOP2_32 <bits<6> op, string opName, list<dag> pattern> {
+
-+ def _e32 : VOP2_Helper <op, VReg_32, AllReg_32, opName, pattern>;
++ def _e32 : VOP2_Helper <op, VReg_32, VSrc_32, opName, pattern>;
+
+ def _e64 : VOP3_32 <{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
+ opName, []
@@ -18740,7 +20294,7 @@ index 0000000..aea3b5a
+}
+
+multiclass VOP2_64 <bits<6> op, string opName, list<dag> pattern> {
-+ def _e32: VOP2_Helper <op, VReg_64, AllReg_64, opName, pattern>;
++ def _e32: VOP2_Helper <op, VReg_64, VSrc_64, opName, pattern>;
+
+ def _e64 : VOP3_64 <
+ {1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
@@ -18754,47 +20308,106 @@ index 0000000..aea3b5a
+class SOPK_64 <bits<5> op, string opName, list<dag> pattern>
+ : SOPK <op, (outs SReg_64:$dst), (ins i16imm:$src0), opName, pattern>;
+
-+class VOPC_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc,
-+ string opName, list<dag> pattern> :
-+ VOPC <
-+ op, (ins arc:$src0, vrc:$src1), opName, pattern
-+ >;
++multiclass VOPC_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc,
++ string opName, list<dag> pattern> {
+
-+multiclass VOPC_32 <bits<9> op, string opName, list<dag> pattern> {
++ def _e32 : VOPC <op, (ins arc:$src0, vrc:$src1), opName, pattern>;
++ def _e64 : VOP3 <
++ {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
++ (outs SReg_64:$dst),
++ (ins arc:$src0, vrc:$src1,
++ InstFlag:$abs, InstFlag:$clamp,
++ InstFlag:$omod, InstFlag:$neg),
++ opName, pattern
++ > {
++ let SRC2 = 0x80;
++ }
++}
+
-+ def _e32 : VOPC_Helper <
-+ {op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-+ VReg_32, AllReg_32, opName, pattern
-+ >;
++multiclass VOPC_32 <bits<8> op, string opName, list<dag> pattern>
++ : VOPC_Helper <op, VReg_32, VSrc_32, opName, pattern>;
+
-+ def _e64 : VOP3_1_32 <
-+ op,
-+ opName, pattern
-+ >;
++multiclass VOPC_64 <bits<8> op, string opName, list<dag> pattern>
++ : VOPC_Helper <op, VReg_64, VSrc_64, opName, pattern>;
++
++class SOPC_32 <bits<7> op, string opName, list<dag> pattern>
++ : SOPC <op, (outs SCCReg:$dst), (ins SSrc_32:$src0, SSrc_32:$src1), opName, pattern>;
++
++class SOPC_64 <bits<7> op, string opName, list<dag> pattern>
++ : SOPC <op, (outs SCCReg:$dst), (ins SSrc_64:$src0, SSrc_64:$src1), opName, pattern>;
++
++class MIMG_Load_Helper <bits<7> op, string asm> : MIMG <
++ op,
++ (outs VReg_128:$vdata),
++ (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
++ i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr,
++ GPR4Align<SReg_256>:$srsrc, GPR4Align<SReg_128>:$ssamp),
++ asm,
++ []> {
++ let mayLoad = 1;
++ let mayStore = 0;
+}
+
-+multiclass VOPC_64 <bits<8> op, string opName, list<dag> pattern> {
++class MTBUF_Store_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
++ op,
++ (outs),
++ (ins regClass:$vdata, i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc,
++ i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr,
++ GPR4Align<SReg_128>:$srsrc, i1imm:$slc, i1imm:$tfe, SSrc_32:$soffset),
++ asm,
++ []> {
++ let mayStore = 1;
++ let mayLoad = 0;
++}
+
-+ def _e32 : VOPC_Helper <op, VReg_64, AllReg_64, opName, pattern>;
++class MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass> : MUBUF <
++ op,
++ (outs regClass:$dst),
++ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
++ i1imm:$lds, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc, i1imm:$slc,
++ i1imm:$tfe, SSrc_32:$soffset),
++ asm,
++ []> {
++ let mayLoad = 1;
++ let mayStore = 0;
++}
+
-+ def _e64 : VOP3_64 <
-+ {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-+ opName, []
-+ >;
++class MTBUF_Load_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
++ op,
++ (outs regClass:$dst),
++ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
++ i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc,
++ i1imm:$slc, i1imm:$tfe, SSrc_32:$soffset),
++ asm,
++ []> {
++ let mayLoad = 1;
++ let mayStore = 0;
+}
+
-+class SOPC_32 <bits<7> op, string opName, list<dag> pattern>
-+ : SOPC <op, (outs SCCReg:$dst), (ins SReg_32:$src0, SReg_32:$src1), opName, pattern>;
++multiclass SMRD_Helper <bits<5> op, string asm, RegisterClass dstClass> {
++ def _IMM : SMRD <
++ op, 1,
++ (outs dstClass:$dst),
++ (ins GPR2Align<SReg_64>:$sbase, i32imm:$offset),
++ asm,
++ []
++ >;
+
-+class SOPC_64 <bits<7> op, string opName, list<dag> pattern>
-+ : SOPC <op, (outs SCCReg:$dst), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>;
++ def _SGPR : SMRD <
++ op, 0,
++ (outs dstClass:$dst),
++ (ins GPR2Align<SReg_64>:$sbase, SReg_32:$soff),
++ asm,
++ []
++ >;
++}
+
diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
new file mode 100644
-index 0000000..adcffa8
+index 0000000..1c4b3cf
--- /dev/null
+++ b/lib/Target/R600/SIInstrInfo.cpp
-@@ -0,0 +1,90 @@
+@@ -0,0 +1,143 @@
+//===-- SIInstrInfo.cpp - SI Instruction Information ---------------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -18839,7 +20452,15 @@ index 0000000..adcffa8
+ // never be necessary.
+ assert(DestReg != AMDGPU::SCC && SrcReg != AMDGPU::SCC);
+
-+ if (AMDGPU::SReg_64RegClass.contains(DestReg)) {
++ if (AMDGPU::VReg_64RegClass.contains(DestReg)) {
++ assert(AMDGPU::VReg_64RegClass.contains(SrcReg) ||
++ AMDGPU::SReg_64RegClass.contains(SrcReg));
++ BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), RI.getSubReg(DestReg, AMDGPU::sub0))
++ .addReg(RI.getSubReg(SrcReg, AMDGPU::sub0), getKillRegState(KillSrc))
++ .addReg(DestReg, RegState::Define | RegState::Implicit);
++ BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), RI.getSubReg(DestReg, AMDGPU::sub1))
++ .addReg(RI.getSubReg(SrcReg, AMDGPU::sub1), getKillRegState(KillSrc));
++ } else if (AMDGPU::SReg_64RegClass.contains(DestReg)) {
+ assert(AMDGPU::SReg_64RegClass.contains(SrcReg));
+ BuildMI(MBB, MI, DL, get(AMDGPU::S_MOV_B64), DestReg)
+ .addReg(SrcReg, getKillRegState(KillSrc));
@@ -18858,8 +20479,8 @@ index 0000000..adcffa8
+
+MachineInstr * SIInstrInfo::getMovImmInstr(MachineFunction *MF, unsigned DstReg,
+ int64_t Imm) const {
-+ MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::V_MOV_IMM_I32), DebugLoc());
-+ MachineInstrBuilder MIB(*MF, MI);
++ MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::V_MOV_B32_e32), DebugLoc());
++ MachineInstrBuilder MIB(MI);
+ MIB.addReg(DstReg, RegState::Define);
+ MIB.addImm(Imm);
+
@@ -18874,9 +20495,6 @@ index 0000000..adcffa8
+ case AMDGPU::S_MOV_B64:
+ case AMDGPU::V_MOV_B32_e32:
+ case AMDGPU::V_MOV_B32_e64:
-+ case AMDGPU::V_MOV_IMM_F32:
-+ case AMDGPU::V_MOV_IMM_I32:
-+ case AMDGPU::S_MOV_IMM_I32:
+ return true;
+ }
+}
@@ -18885,12 +20503,60 @@ index 0000000..adcffa8
+SIInstrInfo::isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const {
+ return RC != &AMDGPU::EXECRegRegClass;
+}
++
++//===----------------------------------------------------------------------===//
++// Indirect addressing callbacks
++//===----------------------------------------------------------------------===//
++
++unsigned SIInstrInfo::calculateIndirectAddress(unsigned RegIndex,
++ unsigned Channel) const {
++ assert(Channel == 0);
++ return RegIndex;
++}
++
++
++int SIInstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const {
++ llvm_unreachable("Unimplemented");
++}
++
++int SIInstrInfo::getIndirectIndexEnd(const MachineFunction &MF) const {
++ llvm_unreachable("Unimplemented");
++}
++
++const TargetRegisterClass *SIInstrInfo::getIndirectAddrStoreRegClass(
++ unsigned SourceReg) const {
++ llvm_unreachable("Unimplemented");
++}
++
++const TargetRegisterClass *SIInstrInfo::getIndirectAddrLoadRegClass() const {
++ llvm_unreachable("Unimplemented");
++}
++
++MachineInstrBuilder SIInstrInfo::buildIndirectWrite(
++ MachineBasicBlock *MBB,
++ MachineBasicBlock::iterator I,
++ unsigned ValueReg,
++ unsigned Address, unsigned OffsetReg) const {
++ llvm_unreachable("Unimplemented");
++}
++
++MachineInstrBuilder SIInstrInfo::buildIndirectRead(
++ MachineBasicBlock *MBB,
++ MachineBasicBlock::iterator I,
++ unsigned ValueReg,
++ unsigned Address, unsigned OffsetReg) const {
++ llvm_unreachable("Unimplemented");
++}
++
++const TargetRegisterClass *SIInstrInfo::getSuperIndirectRegClass() const {
++ llvm_unreachable("Unimplemented");
++}
diff --git a/lib/Target/R600/SIInstrInfo.h b/lib/Target/R600/SIInstrInfo.h
new file mode 100644
-index 0000000..631f6c0
+index 0000000..a65f7b6
--- /dev/null
+++ b/lib/Target/R600/SIInstrInfo.h
-@@ -0,0 +1,62 @@
+@@ -0,0 +1,84 @@
+//===-- SIInstrInfo.h - SI Instruction Info Interface ---------------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -18928,12 +20594,6 @@ index 0000000..631f6c0
+ unsigned DestReg, unsigned SrcReg,
+ bool KillSrc) const;
+
-+ /// \returns the encoding type of this instruction.
-+ unsigned getEncodingType(const MachineInstr &MI) const;
-+
-+ /// \returns the size of this instructions encoding in number of bytes.
-+ unsigned getEncodingBytes(const MachineInstr &MI) const;
-+
+ virtual MachineInstr * getMovImmInstr(MachineFunction *MF, unsigned DstReg,
+ int64_t Imm) const;
+
@@ -18941,6 +20601,32 @@ index 0000000..631f6c0
+ virtual bool isMov(unsigned Opcode) const;
+
+ virtual bool isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const;
++
++ virtual int getIndirectIndexBegin(const MachineFunction &MF) const;
++
++ virtual int getIndirectIndexEnd(const MachineFunction &MF) const;
++
++ virtual unsigned calculateIndirectAddress(unsigned RegIndex,
++ unsigned Channel) const;
++
++ virtual const TargetRegisterClass *getIndirectAddrStoreRegClass(
++ unsigned SourceReg) const;
++
++ virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const;
++
++ virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB,
++ MachineBasicBlock::iterator I,
++ unsigned ValueReg,
++ unsigned Address,
++ unsigned OffsetReg) const;
++
++ virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB,
++ MachineBasicBlock::iterator I,
++ unsigned ValueReg,
++ unsigned Address,
++ unsigned OffsetReg) const;
++
++ virtual const TargetRegisterClass *getSuperIndirectRegClass() const;
+ };
+
+} // End namespace llvm
@@ -18948,17 +20634,19 @@ index 0000000..631f6c0
+namespace SIInstrFlags {
+ enum Flags {
+ // First 4 bits are the instruction encoding
-+ NEED_WAIT = 1 << 4
++ VM_CNT = 1 << 0,
++ EXP_CNT = 1 << 1,
++ LGKM_CNT = 1 << 2
+ };
+}
+
+#endif //SIINSTRINFO_H
diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
new file mode 100644
-index 0000000..873a451
+index 0000000..8c4e5af
--- /dev/null
+++ b/lib/Target/R600/SIInstrInfo.td
-@@ -0,0 +1,589 @@
+@@ -0,0 +1,465 @@
+//===-- SIInstrInfo.td - SI Instruction Encodings ---------*- tablegen -*--===//
+//
+// The LLVM Compiler Infrastructure
@@ -18969,57 +20657,66 @@ index 0000000..873a451
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
-+// SI DAG Profiles
-+//===----------------------------------------------------------------------===//
-+def SDTVCCBinaryOp : SDTypeProfile<1, 2, [
-+ SDTCisInt<0>, SDTCisInt<1>, SDTCisSameAs<1, 2>
-+]>;
-+
-+//===----------------------------------------------------------------------===//
+// SI DAG Nodes
+//===----------------------------------------------------------------------===//
+
-+// and operation on 64-bit wide vcc
-+def SIsreg1_and : SDNode<"SIISD::VCC_AND", SDTVCCBinaryOp,
-+ [SDNPCommutative, SDNPAssociative]
++// SMRD takes a 64bit memory address and can only add an 32bit offset
++def SIadd64bit32bit : SDNode<"ISD::ADD",
++ SDTypeProfile<1, 2, [SDTCisSameAs<0, 1>, SDTCisVT<0, i64>, SDTCisVT<2, i32>]>
+>;
+
-+// Special bitcast node for sharing VCC register between VALU and SALU
-+def SIsreg1_bitcast : SDNode<"SIISD::VCC_BITCAST",
-+ SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisInt<1>]>
-+>;
++// Transformation function, extract the lower 32bit of a 64bit immediate
++def LO32 : SDNodeXForm<imm, [{
++ return CurDAG->getTargetConstant(N->getZExtValue() & 0xffffffff, MVT::i32);
++}]>;
+
-+// and operation on 64-bit wide vcc
-+def SIvcc_and : SDNode<"SIISD::VCC_AND", SDTVCCBinaryOp,
-+ [SDNPCommutative, SDNPAssociative]
++// Transformation function, extract the upper 32bit of a 64bit immediate
++def HI32 : SDNodeXForm<imm, [{
++ return CurDAG->getTargetConstant(N->getZExtValue() >> 32, MVT::i32);
++}]>;
++
++def IMM8bitDWORD : ImmLeaf <
++ i32, [{
++ return (Imm & ~0x3FC) == 0;
++ }], SDNodeXForm<imm, [{
++ return CurDAG->getTargetConstant(
++ N->getZExtValue() >> 2, MVT::i32);
++ }]>
+>;
+
-+// Special bitcast node for sharing VCC register between VALU and SALU
-+def SIvcc_bitcast : SDNode<"SIISD::VCC_BITCAST",
-+ SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisInt<1>]>
++def IMM12bit : ImmLeaf <
++ i16,
++ [{return isUInt<12>(Imm);}]
+>;
+
++class InlineImm <ValueType vt> : ImmLeaf <vt, [{
++ return -16 <= Imm && Imm <= 64;
++}]>;
++
+class InstSI <dag outs, dag ins, string asm, list<dag> pattern> :
+ AMDGPUInst<outs, ins, asm, pattern> {
+
-+ field bits<4> EncodingType = 0;
-+ field bits<1> NeedWait = 0;
-+
-+ let TSFlags{3-0} = EncodingType;
-+ let TSFlags{4} = NeedWait;
++ field bits<1> VM_CNT = 0;
++ field bits<1> EXP_CNT = 0;
++ field bits<1> LGKM_CNT = 0;
+
++ let TSFlags{0} = VM_CNT;
++ let TSFlags{1} = EXP_CNT;
++ let TSFlags{2} = LGKM_CNT;
+}
+
+class Enc32 <dag outs, dag ins, string asm, list<dag> pattern> :
+ InstSI <outs, ins, asm, pattern> {
+
+ field bits<32> Inst;
++ let Size = 4;
+}
+
+class Enc64 <dag outs, dag ins, string asm, list<dag> pattern> :
+ InstSI <outs, ins, asm, pattern> {
+
+ field bits<64> Inst;
++ let Size = 8;
+}
+
+class SIOperand <ValueType vt, dag opInfo>: Operand <vt> {
@@ -19027,49 +20724,16 @@ index 0000000..873a451
+ let MIOperandInfo = opInfo;
+}
+
-+def IMM16bit : ImmLeaf <
-+ i16,
-+ [{return isInt<16>(Imm);}]
-+>;
-+
-+def IMM8bit : ImmLeaf <
-+ i32,
-+ [{return (int32_t)Imm >= 0 && (int32_t)Imm <= 0xff;}]
-+>;
-+
-+def IMM12bit : ImmLeaf <
-+ i16,
-+ [{return (int16_t)Imm >= 0 && (int16_t)Imm <= 0xfff;}]
-+>;
-+
-+def IMM32bitIn64bit : ImmLeaf <
-+ i64,
-+ [{return isInt<32>(Imm);}]
-+>;
-+
+class GPR4Align <RegisterClass rc> : Operand <vAny> {
+ let EncoderMethod = "GPR4AlignEncode";
+ let MIOperandInfo = (ops rc:$reg);
+}
+
-+class GPR2Align <RegisterClass rc, ValueType vt> : Operand <vt> {
++class GPR2Align <RegisterClass rc> : Operand <iPTR> {
+ let EncoderMethod = "GPR2AlignEncode";
+ let MIOperandInfo = (ops rc:$reg);
+}
+
-+def SMRDmemrr : Operand<iPTR> {
-+ let MIOperandInfo = (ops SReg_64, SReg_32);
-+ let EncoderMethod = "GPR2AlignEncode";
-+}
-+
-+def SMRDmemri : Operand<iPTR> {
-+ let MIOperandInfo = (ops SReg_64, i32imm);
-+ let EncoderMethod = "SMRDmemriEncode";
-+}
-+
-+def ADDR_Reg : ComplexPattern<i64, 2, "SelectADDRReg", [], []>;
-+def ADDR_Offset8 : ComplexPattern<i64, 2, "SelectADDR8BitOffset", [], []>;
-+
+let Uses = [EXEC] in {
+
+def EXP : Enc64<
@@ -19099,10 +20763,8 @@ index 0000000..873a451
+ let Inst{47-40} = VSRC1;
+ let Inst{55-48} = VSRC2;
+ let Inst{63-56} = VSRC3;
-+ let EncodingType = 0; //SIInstrEncodingType::EXP
+
-+ let NeedWait = 1;
-+ let usesCustomInserter = 1;
++ let EXP_CNT = 1;
+}
+
+class MIMG <bits<7> op, dag outs, dag ins, string asm, list<dag> pattern> :
@@ -19136,10 +20798,8 @@ index 0000000..873a451
+ let Inst{52-48} = SRSRC;
+ let Inst{57-53} = SSAMP;
+
-+ let EncodingType = 2; //SIInstrEncodingType::MIMG
-+
-+ let NeedWait = 1;
-+ let usesCustomInserter = 1;
++ let VM_CNT = 1;
++ let EXP_CNT = 1;
+}
+
+class MTBUF <bits<3> op, dag outs, dag ins, string asm, list<dag> pattern> :
@@ -19174,10 +20834,10 @@ index 0000000..873a451
+ let Inst{54} = SLC;
+ let Inst{55} = TFE;
+ let Inst{63-56} = SOFFSET;
-+ let EncodingType = 3; //SIInstrEncodingType::MTBUF
+
-+ let NeedWait = 1;
-+ let usesCustomInserter = 1;
++ let VM_CNT = 1;
++ let EXP_CNT = 1;
++
+ let neverHasSideEffects = 1;
+}
+
@@ -19211,34 +20871,30 @@ index 0000000..873a451
+ let Inst{54} = SLC;
+ let Inst{55} = TFE;
+ let Inst{63-56} = SOFFSET;
-+ let EncodingType = 4; //SIInstrEncodingType::MUBUF
+
-+ let NeedWait = 1;
-+ let usesCustomInserter = 1;
++ let VM_CNT = 1;
++ let EXP_CNT = 1;
++
+ let neverHasSideEffects = 1;
+}
+
+} // End Uses = [EXEC]
+
-+class SMRD <bits<5> op, dag outs, dag ins, string asm, list<dag> pattern> :
-+ Enc32<outs, ins, asm, pattern> {
++class SMRD <bits<5> op, bits<1> imm, dag outs, dag ins, string asm,
++ list<dag> pattern> : Enc32<outs, ins, asm, pattern> {
+
+ bits<7> SDST;
-+ bits<15> PTR;
-+ bits<8> OFFSET = PTR{7-0};
-+ bits<1> IMM = PTR{8};
-+ bits<6> SBASE = PTR{14-9};
++ bits<6> SBASE;
++ bits<8> OFFSET;
+
+ let Inst{7-0} = OFFSET;
-+ let Inst{8} = IMM;
++ let Inst{8} = imm;
+ let Inst{14-9} = SBASE;
+ let Inst{21-15} = SDST;
+ let Inst{26-22} = op;
+ let Inst{31-27} = 0x18; //encoding
-+ let EncodingType = 5; //SIInstrEncodingType::SMRD
+
-+ let NeedWait = 1;
-+ let usesCustomInserter = 1;
++ let LGKM_CNT = 1;
+}
+
+class SOP1 <bits<8> op, dag outs, dag ins, string asm, list<dag> pattern> :
@@ -19251,7 +20907,6 @@ index 0000000..873a451
+ let Inst{15-8} = op;
+ let Inst{22-16} = SDST;
+ let Inst{31-23} = 0x17d; //encoding;
-+ let EncodingType = 6; //SIInstrEncodingType::SOP1
+
+ let mayLoad = 0;
+ let mayStore = 0;
@@ -19270,7 +20925,6 @@ index 0000000..873a451
+ let Inst{22-16} = SDST;
+ let Inst{29-23} = op;
+ let Inst{31-30} = 0x2; // encoding
-+ let EncodingType = 7; // SIInstrEncodingType::SOP2
+
+ let mayLoad = 0;
+ let mayStore = 0;
@@ -19287,7 +20941,6 @@ index 0000000..873a451
+ let Inst{15-8} = SSRC1;
+ let Inst{22-16} = op;
+ let Inst{31-23} = 0x17e;
-+ let EncodingType = 8; // SIInstrEncodingType::SOPC
+
+ let DisableEncoding = "$dst";
+ let mayLoad = 0;
@@ -19305,7 +20958,6 @@ index 0000000..873a451
+ let Inst{22-16} = SDST;
+ let Inst{27-23} = op;
+ let Inst{31-28} = 0xb; //encoding
-+ let EncodingType = 9; // SIInstrEncodingType::SOPK
+
+ let mayLoad = 0;
+ let mayStore = 0;
@@ -19323,7 +20975,6 @@ index 0000000..873a451
+ let Inst{15-0} = SIMM16;
+ let Inst{22-16} = op;
+ let Inst{31-23} = 0x17f; // encoding
-+ let EncodingType = 10; // SIInstrEncodingType::SOPP
+
+ let mayLoad = 0;
+ let mayStore = 0;
@@ -19346,7 +20997,6 @@ index 0000000..873a451
+ let Inst{17-16} = op;
+ let Inst{25-18} = VDST;
+ let Inst{31-26} = 0x32; // encoding
-+ let EncodingType = 11; // SIInstrEncodingType::VINTRP
+
+ let neverHasSideEffects = 1;
+ let mayLoad = 1;
@@ -19364,9 +21014,6 @@ index 0000000..873a451
+ let Inst{24-17} = VDST;
+ let Inst{31-25} = 0x3f; //encoding
+
-+ let EncodingType = 12; // SIInstrEncodingType::VOP1
-+ let PostEncoderMethod = "VOPPostEncode";
-+
+ let mayLoad = 0;
+ let mayStore = 0;
+ let hasSideEffects = 0;
@@ -19385,9 +21032,6 @@ index 0000000..873a451
+ let Inst{30-25} = op;
+ let Inst{31} = 0x0; //encoding
+
-+ let EncodingType = 13; // SIInstrEncodingType::VOP2
-+ let PostEncoderMethod = "VOPPostEncode";
-+
+ let mayLoad = 0;
+ let mayStore = 0;
+ let hasSideEffects = 0;
@@ -19416,9 +21060,6 @@ index 0000000..873a451
+ let Inst{60-59} = OMOD;
+ let Inst{63-61} = NEG;
+
-+ let EncodingType = 14; // SIInstrEncodingType::VOP3
-+ let PostEncoderMethod = "VOPPostEncode";
-+
+ let mayLoad = 0;
+ let mayStore = 0;
+ let hasSideEffects = 0;
@@ -19433,127 +21074,50 @@ index 0000000..873a451
+ bits<9> SRC2;
+ bits<7> SDST;
+ bits<2> OMOD;
-+ bits<3> NEG;
-+
-+ let Inst{7-0} = VDST;
-+ let Inst{14-8} = SDST;
-+ let Inst{25-17} = op;
-+ let Inst{31-26} = 0x34; //encoding
-+ let Inst{40-32} = SRC0;
-+ let Inst{49-41} = SRC1;
-+ let Inst{58-50} = SRC2;
-+ let Inst{60-59} = OMOD;
-+ let Inst{63-61} = NEG;
-+
-+ let EncodingType = 14; // SIInstrEncodingType::VOP3
-+ let PostEncoderMethod = "VOPPostEncode";
-+
-+ let mayLoad = 0;
-+ let mayStore = 0;
-+ let hasSideEffects = 0;
-+}
-+
-+class VOPC <bits<8> op, dag ins, string asm, list<dag> pattern> :
-+ Enc32 <(outs VCCReg:$dst), ins, asm, pattern> {
-+
-+ bits<9> SRC0;
-+ bits<8> VSRC1;
-+
-+ let Inst{8-0} = SRC0;
-+ let Inst{16-9} = VSRC1;
-+ let Inst{24-17} = op;
-+ let Inst{31-25} = 0x3e;
-+
-+ let EncodingType = 15; //SIInstrEncodingType::VOPC
-+ let PostEncoderMethod = "VOPPostEncode";
-+ let DisableEncoding = "$dst";
-+ let mayLoad = 0;
-+ let mayStore = 0;
-+ let hasSideEffects = 0;
-+}
-+
-+} // End Uses = [EXEC]
-+
-+class MIMG_Load_Helper <bits<7> op, string asm> : MIMG <
-+ op,
-+ (outs VReg_128:$vdata),
-+ (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
-+ i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_128:$vaddr,
-+ GPR4Align<SReg_256>:$srsrc, GPR4Align<SReg_128>:$ssamp),
-+ asm,
-+ []> {
-+ let mayLoad = 1;
-+ let mayStore = 0;
-+}
-+
-+class MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass> : MUBUF <
-+ op,
-+ (outs regClass:$dst),
-+ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
-+ i1imm:$lds, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc, i1imm:$slc,
-+ i1imm:$tfe, SReg_32:$soffset),
-+ asm,
-+ []> {
-+ let mayLoad = 1;
-+ let mayStore = 0;
-+}
++ bits<3> NEG;
+
-+class MTBUF_Load_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
-+ op,
-+ (outs regClass:$dst),
-+ (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
-+ i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc,
-+ i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
-+ asm,
-+ []> {
-+ let mayLoad = 1;
-+ let mayStore = 0;
-+}
++ let Inst{7-0} = VDST;
++ let Inst{14-8} = SDST;
++ let Inst{25-17} = op;
++ let Inst{31-26} = 0x34; //encoding
++ let Inst{40-32} = SRC0;
++ let Inst{49-41} = SRC1;
++ let Inst{58-50} = SRC2;
++ let Inst{60-59} = OMOD;
++ let Inst{63-61} = NEG;
+
-+class MTBUF_Store_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
-+ op,
-+ (outs),
-+ (ins regClass:$vdata, i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc,
-+ i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr,
-+ GPR4Align<SReg_128>:$srsrc, i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
-+ asm,
-+ []> {
-+ let mayStore = 1;
+ let mayLoad = 0;
++ let mayStore = 0;
++ let hasSideEffects = 0;
+}
+
-+multiclass SMRD_Helper <bits<5> op, string asm, RegisterClass dstClass,
-+ ValueType vt> {
-+ def _IMM : SMRD <
-+ op,
-+ (outs dstClass:$dst),
-+ (ins SMRDmemri:$src0),
-+ asm,
-+ [(set (vt dstClass:$dst), (constant_load ADDR_Offset8:$src0))]
-+ >;
++class VOPC <bits<8> op, dag ins, string asm, list<dag> pattern> :
++ Enc32 <(outs VCCReg:$dst), ins, asm, pattern> {
+
-+ def _SGPR : SMRD <
-+ op,
-+ (outs dstClass:$dst),
-+ (ins SMRDmemrr:$src0),
-+ asm,
-+ [(set (vt dstClass:$dst), (constant_load ADDR_Reg:$src0))]
-+ >;
-+}
++ bits<9> SRC0;
++ bits<8> VSRC1;
+
-+multiclass SMRD_32 <bits<5> op, string asm, RegisterClass dstClass> {
-+ defm _F32 : SMRD_Helper <op, asm, dstClass, f32>;
-+ defm _I32 : SMRD_Helper <op, asm, dstClass, i32>;
++ let Inst{8-0} = SRC0;
++ let Inst{16-9} = VSRC1;
++ let Inst{24-17} = op;
++ let Inst{31-25} = 0x3e;
++
++ let DisableEncoding = "$dst";
++ let mayLoad = 0;
++ let mayStore = 0;
++ let hasSideEffects = 0;
+}
+
++} // End Uses = [EXEC]
++
+include "SIInstrFormats.td"
+include "SIInstructions.td"
diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
new file mode 100644
-index 0000000..005be96
+index 0000000..3a9822a
--- /dev/null
+++ b/lib/Target/R600/SIInstructions.td
-@@ -0,0 +1,1351 @@
+@@ -0,0 +1,1462 @@
+//===-- SIInstructions.td - SI Instruction Defintions ---------------------===//
+//
+// The LLVM Compiler Infrastructure
@@ -19567,6 +21131,17 @@ index 0000000..005be96
+// that are not yet supported remain commented out.
+//===----------------------------------------------------------------------===//
+
++class InterpSlots {
++int P0 = 2;
++int P10 = 0;
++int P20 = 1;
++}
++def INTERP : InterpSlots;
++
++def InterpSlot : Operand<i32> {
++ let PrintMethod = "printInterpSlot";
++}
++
+def isSI : Predicate<"Subtarget.device()"
+ "->getGeneration() == AMDGPUDeviceInfo::HD7XXX">;
+
@@ -19675,33 +21250,33 @@ index 0000000..005be96
+defm V_CMP_F_F32 : VOPC_32 <0x00000000, "V_CMP_F_F32", []>;
+defm V_CMP_LT_F32 : VOPC_32 <0x00000001, "V_CMP_LT_F32", []>;
+def : Pat <
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_LT)),
-+ (V_CMP_LT_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LT)),
++ (V_CMP_LT_F32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_EQ_F32 : VOPC_32 <0x00000002, "V_CMP_EQ_F32", []>;
+def : Pat <
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_EQ)),
-+ (V_CMP_EQ_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_EQ)),
++ (V_CMP_EQ_F32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_LE_F32 : VOPC_32 <0x00000003, "V_CMP_LE_F32", []>;
+def : Pat <
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_LE)),
-+ (V_CMP_LE_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LE)),
++ (V_CMP_LE_F32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_GT_F32 : VOPC_32 <0x00000004, "V_CMP_GT_F32", []>;
+def : Pat <
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_GT)),
-+ (V_CMP_GT_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GT)),
++ (V_CMP_GT_F32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_LG_F32 : VOPC_32 <0x00000005, "V_CMP_LG_F32", []>;
+def : Pat <
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_NE)),
-+ (V_CMP_LG_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)),
++ (V_CMP_LG_F32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_GE_F32 : VOPC_32 <0x00000006, "V_CMP_GE_F32", []>;
+def : Pat <
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_GE)),
-+ (V_CMP_GE_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GE)),
++ (V_CMP_GE_F32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_O_F32 : VOPC_32 <0x00000007, "V_CMP_O_F32", []>;
+defm V_CMP_U_F32 : VOPC_32 <0x00000008, "V_CMP_U_F32", []>;
@@ -19711,8 +21286,8 @@ index 0000000..005be96
+defm V_CMP_NLE_F32 : VOPC_32 <0x0000000c, "V_CMP_NLE_F32", []>;
+defm V_CMP_NEQ_F32 : VOPC_32 <0x0000000d, "V_CMP_NEQ_F32", []>;
+def : Pat <
-+ (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_NE)),
-+ (V_CMP_NEQ_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)),
++ (V_CMP_NEQ_F32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_NLT_F32 : VOPC_32 <0x0000000e, "V_CMP_NLT_F32", []>;
+defm V_CMP_TRU_F32 : VOPC_32 <0x0000000f, "V_CMP_TRU_F32", []>;
@@ -19845,33 +21420,33 @@ index 0000000..005be96
+defm V_CMP_F_I32 : VOPC_32 <0x00000080, "V_CMP_F_I32", []>;
+defm V_CMP_LT_I32 : VOPC_32 <0x00000081, "V_CMP_LT_I32", []>;
+def : Pat <
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_LT)),
-+ (V_CMP_LT_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_LT)),
++ (V_CMP_LT_I32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_EQ_I32 : VOPC_32 <0x00000082, "V_CMP_EQ_I32", []>;
+def : Pat <
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_EQ)),
-+ (V_CMP_EQ_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_EQ)),
++ (V_CMP_EQ_I32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_LE_I32 : VOPC_32 <0x00000083, "V_CMP_LE_I32", []>;
+def : Pat <
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_LE)),
-+ (V_CMP_LE_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_LE)),
++ (V_CMP_LE_I32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_GT_I32 : VOPC_32 <0x00000084, "V_CMP_GT_I32", []>;
+def : Pat <
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_GT)),
-+ (V_CMP_GT_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_GT)),
++ (V_CMP_GT_I32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_NE_I32 : VOPC_32 <0x00000085, "V_CMP_NE_I32", []>;
+def : Pat <
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_NE)),
-+ (V_CMP_NE_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_NE)),
++ (V_CMP_NE_I32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_GE_I32 : VOPC_32 <0x00000086, "V_CMP_GE_I32", []>;
+def : Pat <
-+ (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_GE)),
-+ (V_CMP_GE_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++ (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_GE)),
++ (V_CMP_GE_I32_e64 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_CMP_T_I32 : VOPC_32 <0x00000087, "V_CMP_T_I32", []>;
+
@@ -20017,11 +21592,13 @@ index 0000000..005be96
+//def TBUFFER_STORE_FORMAT_XYZ : MTBUF_ <0x00000006, "TBUFFER_STORE_FORMAT_XYZ", []>;
+//def TBUFFER_STORE_FORMAT_XYZW : MTBUF_ <0x00000007, "TBUFFER_STORE_FORMAT_XYZW", []>;
+
-+defm S_LOAD_DWORD : SMRD_32 <0x00000000, "S_LOAD_DWORD", SReg_32>;
++let mayLoad = 1 in {
++
++defm S_LOAD_DWORD : SMRD_Helper <0x00000000, "S_LOAD_DWORD", SReg_32>;
+
+//def S_LOAD_DWORDX2 : SMRD_DWORDX2 <0x00000001, "S_LOAD_DWORDX2", []>;
-+defm S_LOAD_DWORDX4 : SMRD_Helper <0x00000002, "S_LOAD_DWORDX4", SReg_128, v4i32>;
-+defm S_LOAD_DWORDX8 : SMRD_Helper <0x00000003, "S_LOAD_DWORDX8", SReg_256, v8i32>;
++defm S_LOAD_DWORDX4 : SMRD_Helper <0x00000002, "S_LOAD_DWORDX4", SReg_128>;
++defm S_LOAD_DWORDX8 : SMRD_Helper <0x00000003, "S_LOAD_DWORDX8", SReg_256>;
+//def S_LOAD_DWORDX16 : SMRD_DWORDX16 <0x00000004, "S_LOAD_DWORDX16", []>;
+//def S_BUFFER_LOAD_DWORD : SMRD_ <0x00000008, "S_BUFFER_LOAD_DWORD", []>;
+//def S_BUFFER_LOAD_DWORDX2 : SMRD_DWORDX2 <0x00000009, "S_BUFFER_LOAD_DWORDX2", []>;
@@ -20029,6 +21606,8 @@ index 0000000..005be96
+//def S_BUFFER_LOAD_DWORDX8 : SMRD_DWORDX8 <0x0000000b, "S_BUFFER_LOAD_DWORDX8", []>;
+//def S_BUFFER_LOAD_DWORDX16 : SMRD_DWORDX16 <0x0000000c, "S_BUFFER_LOAD_DWORDX16", []>;
+
++} // mayLoad = 1
++
+//def S_MEMTIME : SMRD_ <0x0000001e, "S_MEMTIME", []>;
+//def S_DCACHE_INV : SMRD_ <0x0000001f, "S_DCACHE_INV", []>;
+//def IMAGE_LOAD : MIMG_NoPattern_ <"IMAGE_LOAD", 0x00000000>;
@@ -20067,12 +21646,12 @@ index 0000000..005be96
+def IMAGE_SAMPLE_B : MIMG_Load_Helper <0x00000025, "IMAGE_SAMPLE_B">;
+//def IMAGE_SAMPLE_B_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_B_CL", 0x00000026>;
+//def IMAGE_SAMPLE_LZ : MIMG_NoPattern_ <"IMAGE_SAMPLE_LZ", 0x00000027>;
-+//def IMAGE_SAMPLE_C : MIMG_NoPattern_ <"IMAGE_SAMPLE_C", 0x00000028>;
++def IMAGE_SAMPLE_C : MIMG_Load_Helper <0x00000028, "IMAGE_SAMPLE_C">;
+//def IMAGE_SAMPLE_C_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_CL", 0x00000029>;
+//def IMAGE_SAMPLE_C_D : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_D", 0x0000002a>;
+//def IMAGE_SAMPLE_C_D_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_D_CL", 0x0000002b>;
-+//def IMAGE_SAMPLE_C_L : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_L", 0x0000002c>;
-+//def IMAGE_SAMPLE_C_B : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_B", 0x0000002d>;
++def IMAGE_SAMPLE_C_L : MIMG_Load_Helper <0x0000002c, "IMAGE_SAMPLE_C_L">;
++def IMAGE_SAMPLE_C_B : MIMG_Load_Helper <0x0000002d, "IMAGE_SAMPLE_C_B">;
+//def IMAGE_SAMPLE_C_B_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_B_CL", 0x0000002e>;
+//def IMAGE_SAMPLE_C_LZ : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_LZ", 0x0000002f>;
+//def IMAGE_SAMPLE_O : MIMG_NoPattern_ <"IMAGE_SAMPLE_O", 0x00000030>;
@@ -20135,12 +21714,12 @@ index 0000000..005be96
+//defm V_CVT_I32_F64 : VOP1_32 <0x00000003, "V_CVT_I32_F64", []>;
+//defm V_CVT_F64_I32 : VOP1_64 <0x00000004, "V_CVT_F64_I32", []>;
+defm V_CVT_F32_I32 : VOP1_32 <0x00000005, "V_CVT_F32_I32",
-+ [(set VReg_32:$dst, (sint_to_fp AllReg_32:$src0))]
++ [(set VReg_32:$dst, (sint_to_fp VSrc_32:$src0))]
+>;
+//defm V_CVT_F32_U32 : VOP1_32 <0x00000006, "V_CVT_F32_U32", []>;
+//defm V_CVT_U32_F32 : VOP1_32 <0x00000007, "V_CVT_U32_F32", []>;
+defm V_CVT_I32_F32 : VOP1_32 <0x00000008, "V_CVT_I32_F32",
-+ [(set VReg_32:$dst, (fp_to_sint AllReg_32:$src0))]
++ [(set (i32 VReg_32:$dst), (fp_to_sint VSrc_32:$src0))]
+>;
+defm V_MOV_FED_B32 : VOP1_32 <0x00000009, "V_MOV_FED_B32", []>;
+////def V_CVT_F16_F32 : VOP1_F16 <0x0000000a, "V_CVT_F16_F32", []>;
@@ -20157,31 +21736,35 @@ index 0000000..005be96
+//defm V_CVT_U32_F64 : VOP1_32 <0x00000015, "V_CVT_U32_F64", []>;
+//defm V_CVT_F64_U32 : VOP1_64 <0x00000016, "V_CVT_F64_U32", []>;
+defm V_FRACT_F32 : VOP1_32 <0x00000020, "V_FRACT_F32",
-+ [(set VReg_32:$dst, (AMDGPUfract AllReg_32:$src0))]
++ [(set VReg_32:$dst, (AMDGPUfract VSrc_32:$src0))]
+>;
+defm V_TRUNC_F32 : VOP1_32 <0x00000021, "V_TRUNC_F32", []>;
-+defm V_CEIL_F32 : VOP1_32 <0x00000022, "V_CEIL_F32", []>;
++defm V_CEIL_F32 : VOP1_32 <0x00000022, "V_CEIL_F32",
++ [(set VReg_32:$dst, (fceil VSrc_32:$src0))]
++>;
+defm V_RNDNE_F32 : VOP1_32 <0x00000023, "V_RNDNE_F32",
-+ [(set VReg_32:$dst, (frint AllReg_32:$src0))]
++ [(set VReg_32:$dst, (frint VSrc_32:$src0))]
+>;
+defm V_FLOOR_F32 : VOP1_32 <0x00000024, "V_FLOOR_F32",
-+ [(set VReg_32:$dst, (ffloor AllReg_32:$src0))]
++ [(set VReg_32:$dst, (ffloor VSrc_32:$src0))]
+>;
+defm V_EXP_F32 : VOP1_32 <0x00000025, "V_EXP_F32",
-+ [(set VReg_32:$dst, (fexp2 AllReg_32:$src0))]
++ [(set VReg_32:$dst, (fexp2 VSrc_32:$src0))]
+>;
+defm V_LOG_CLAMP_F32 : VOP1_32 <0x00000026, "V_LOG_CLAMP_F32", []>;
-+defm V_LOG_F32 : VOP1_32 <0x00000027, "V_LOG_F32", []>;
++defm V_LOG_F32 : VOP1_32 <0x00000027, "V_LOG_F32",
++ [(set VReg_32:$dst, (flog2 VSrc_32:$src0))]
++>;
+defm V_RCP_CLAMP_F32 : VOP1_32 <0x00000028, "V_RCP_CLAMP_F32", []>;
+defm V_RCP_LEGACY_F32 : VOP1_32 <0x00000029, "V_RCP_LEGACY_F32", []>;
+defm V_RCP_F32 : VOP1_32 <0x0000002a, "V_RCP_F32",
-+ [(set VReg_32:$dst, (fdiv FP_ONE, AllReg_32:$src0))]
++ [(set VReg_32:$dst, (fdiv FP_ONE, VSrc_32:$src0))]
+>;
+defm V_RCP_IFLAG_F32 : VOP1_32 <0x0000002b, "V_RCP_IFLAG_F32", []>;
+defm V_RSQ_CLAMP_F32 : VOP1_32 <0x0000002c, "V_RSQ_CLAMP_F32", []>;
+defm V_RSQ_LEGACY_F32 : VOP1_32 <
+ 0x0000002d, "V_RSQ_LEGACY_F32",
-+ [(set VReg_32:$dst, (int_AMDGPU_rsq AllReg_32:$src0))]
++ [(set VReg_32:$dst, (int_AMDGPU_rsq VSrc_32:$src0))]
+>;
+defm V_RSQ_F32 : VOP1_32 <0x0000002e, "V_RSQ_F32", []>;
+defm V_RCP_F64 : VOP1_64 <0x0000002f, "V_RCP_F64", []>;
@@ -20231,10 +21814,9 @@ index 0000000..005be96
+def V_INTERP_MOV_F32 : VINTRP <
+ 0x00000002,
+ (outs VReg_32:$dst),
-+ (ins i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
-+ "V_INTERP_MOV_F32",
++ (ins InterpSlot:$src0, i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
++ "V_INTERP_MOV_F32 $dst, $src0, $attr_chan, $attr",
+ []> {
-+ let VSRC = 0;
+ let DisableEncoding = "$m0";
+}
+
@@ -20314,22 +21896,22 @@ index 0000000..005be96
+//def S_TTRACEDATA : SOPP_ <0x00000016, "S_TTRACEDATA", []>;
+
+def V_CNDMASK_B32_e32 : VOP2 <0x00000000, (outs VReg_32:$dst),
-+ (ins AllReg_32:$src0, VReg_32:$src1, VCCReg:$vcc), "V_CNDMASK_B32_e32",
++ (ins VSrc_32:$src0, VReg_32:$src1, VCCReg:$vcc), "V_CNDMASK_B32_e32",
+ []
+>{
+ let DisableEncoding = "$vcc";
+}
+
+def V_CNDMASK_B32_e64 : VOP3 <0x00000100, (outs VReg_32:$dst),
-+ (ins VReg_32:$src0, VReg_32:$src1, SReg_1:$src2, InstFlag:$abs, InstFlag:$clamp, InstFlag:$omod, InstFlag:$neg),
++ (ins VReg_32:$src0, VReg_32:$src1, SReg_64:$src2, InstFlag:$abs, InstFlag:$clamp, InstFlag:$omod, InstFlag:$neg),
+ "V_CNDMASK_B32_e64",
-+ [(set (i32 VReg_32:$dst), (select SReg_1:$src2, VReg_32:$src1, VReg_32:$src0))]
++ [(set (i32 VReg_32:$dst), (select (i1 SReg_64:$src2), VReg_32:$src1, VReg_32:$src0))]
+>;
+
+//f32 pattern for V_CNDMASK_B32_e64
+def : Pat <
-+ (f32 (select SReg_1:$src2, VReg_32:$src1, VReg_32:$src0)),
-+ (V_CNDMASK_B32_e64 VReg_32:$src0, VReg_32:$src1, SReg_1:$src2)
++ (f32 (select (i1 SReg_64:$src2), VReg_32:$src1, VReg_32:$src0)),
++ (V_CNDMASK_B32_e64 VReg_32:$src0, VReg_32:$src1, SReg_64:$src2)
+>;
+
+defm V_READLANE_B32 : VOP2_32 <0x00000001, "V_READLANE_B32", []>;
@@ -20337,35 +21919,35 @@ index 0000000..005be96
+
+defm V_ADD_F32 : VOP2_32 <0x00000003, "V_ADD_F32", []>;
+def : Pat <
-+ (f32 (fadd AllReg_32:$src0, VReg_32:$src1)),
-+ (V_ADD_F32_e32 AllReg_32:$src0, VReg_32:$src1)
++ (f32 (fadd VSrc_32:$src0, VReg_32:$src1)),
++ (V_ADD_F32_e32 VSrc_32:$src0, VReg_32:$src1)
+>;
+
+defm V_SUB_F32 : VOP2_32 <0x00000004, "V_SUB_F32", []>;
+def : Pat <
-+ (f32 (fsub AllReg_32:$src0, VReg_32:$src1)),
-+ (V_SUB_F32_e32 AllReg_32:$src0, VReg_32:$src1)
++ (f32 (fsub VSrc_32:$src0, VReg_32:$src1)),
++ (V_SUB_F32_e32 VSrc_32:$src0, VReg_32:$src1)
+>;
+defm V_SUBREV_F32 : VOP2_32 <0x00000005, "V_SUBREV_F32", []>;
+defm V_MAC_LEGACY_F32 : VOP2_32 <0x00000006, "V_MAC_LEGACY_F32", []>;
+defm V_MUL_LEGACY_F32 : VOP2_32 <
+ 0x00000007, "V_MUL_LEGACY_F32",
-+ [(set VReg_32:$dst, (int_AMDGPU_mul AllReg_32:$src0, VReg_32:$src1))]
++ [(set VReg_32:$dst, (int_AMDGPU_mul VSrc_32:$src0, VReg_32:$src1))]
+>;
+
+defm V_MUL_F32 : VOP2_32 <0x00000008, "V_MUL_F32",
-+ [(set VReg_32:$dst, (fmul AllReg_32:$src0, VReg_32:$src1))]
++ [(set VReg_32:$dst, (fmul VSrc_32:$src0, VReg_32:$src1))]
+>;
+//defm V_MUL_I32_I24 : VOP2_32 <0x00000009, "V_MUL_I32_I24", []>;
+//defm V_MUL_HI_I32_I24 : VOP2_32 <0x0000000a, "V_MUL_HI_I32_I24", []>;
+//defm V_MUL_U32_U24 : VOP2_32 <0x0000000b, "V_MUL_U32_U24", []>;
+//defm V_MUL_HI_U32_U24 : VOP2_32 <0x0000000c, "V_MUL_HI_U32_U24", []>;
+defm V_MIN_LEGACY_F32 : VOP2_32 <0x0000000d, "V_MIN_LEGACY_F32",
-+ [(set VReg_32:$dst, (AMDGPUfmin AllReg_32:$src0, VReg_32:$src1))]
++ [(set VReg_32:$dst, (AMDGPUfmin VSrc_32:$src0, VReg_32:$src1))]
+>;
+
+defm V_MAX_LEGACY_F32 : VOP2_32 <0x0000000e, "V_MAX_LEGACY_F32",
-+ [(set VReg_32:$dst, (AMDGPUfmax AllReg_32:$src0, VReg_32:$src1))]
++ [(set VReg_32:$dst, (AMDGPUfmax VSrc_32:$src0, VReg_32:$src1))]
+>;
+defm V_MIN_F32 : VOP2_32 <0x0000000f, "V_MIN_F32", []>;
+defm V_MAX_F32 : VOP2_32 <0x00000010, "V_MAX_F32", []>;
@@ -20380,13 +21962,13 @@ index 0000000..005be96
+defm V_LSHL_B32 : VOP2_32 <0x00000019, "V_LSHL_B32", []>;
+defm V_LSHLREV_B32 : VOP2_32 <0x0000001a, "V_LSHLREV_B32", []>;
+defm V_AND_B32 : VOP2_32 <0x0000001b, "V_AND_B32",
-+ [(set VReg_32:$dst, (and AllReg_32:$src0, VReg_32:$src1))]
++ [(set VReg_32:$dst, (and VSrc_32:$src0, VReg_32:$src1))]
+>;
+defm V_OR_B32 : VOP2_32 <0x0000001c, "V_OR_B32",
-+ [(set VReg_32:$dst, (or AllReg_32:$src0, VReg_32:$src1))]
++ [(set VReg_32:$dst, (or VSrc_32:$src0, VReg_32:$src1))]
+>;
+defm V_XOR_B32 : VOP2_32 <0x0000001d, "V_XOR_B32",
-+ [(set VReg_32:$dst, (xor AllReg_32:$src0, VReg_32:$src1))]
++ [(set VReg_32:$dst, (xor VSrc_32:$src0, VReg_32:$src1))]
+>;
+defm V_BFM_B32 : VOP2_32 <0x0000001e, "V_BFM_B32", []>;
+defm V_MAC_F32 : VOP2_32 <0x0000001f, "V_MAC_F32", []>;
@@ -20397,10 +21979,10 @@ index 0000000..005be96
+//defm V_MBCNT_HI_U32_B32 : VOP2_32 <0x00000024, "V_MBCNT_HI_U32_B32", []>;
+let Defs = [VCC] in { // Carry-out goes to VCC
+defm V_ADD_I32 : VOP2_32 <0x00000025, "V_ADD_I32",
-+ [(set VReg_32:$dst, (add (i32 AllReg_32:$src0), (i32 VReg_32:$src1)))]
++ [(set VReg_32:$dst, (add (i32 VSrc_32:$src0), (i32 VReg_32:$src1)))]
+>;
+defm V_SUB_I32 : VOP2_32 <0x00000026, "V_SUB_I32",
-+ [(set VReg_32:$dst, (sub (i32 AllReg_32:$src0), (i32 VReg_32:$src1)))]
++ [(set VReg_32:$dst, (sub (i32 VSrc_32:$src0), (i32 VReg_32:$src1)))]
+>;
+} // End Defs = [VCC]
+defm V_SUBREV_I32 : VOP2_32 <0x00000027, "V_SUBREV_I32", []>;
@@ -20412,7 +21994,7 @@ index 0000000..005be96
+////def V_CVT_PKNORM_I16_F32 : VOP2_I16 <0x0000002d, "V_CVT_PKNORM_I16_F32", []>;
+////def V_CVT_PKNORM_U16_F32 : VOP2_U16 <0x0000002e, "V_CVT_PKNORM_U16_F32", []>;
+defm V_CVT_PKRTZ_F16_F32 : VOP2_32 <0x0000002f, "V_CVT_PKRTZ_F16_F32",
-+ [(set VReg_32:$dst, (int_SI_packf16 AllReg_32:$src0, VReg_32:$src1))]
++ [(set VReg_32:$dst, (int_SI_packf16 VSrc_32:$src0, VReg_32:$src1))]
+>;
+////def V_CVT_PK_U16_U32 : VOP2_U16 <0x00000030, "V_CVT_PK_U16_U32", []>;
+////def V_CVT_PK_I16_I32 : VOP2_I16 <0x00000031, "V_CVT_PK_I16_I32", []>;
@@ -20482,6 +22064,10 @@ index 0000000..005be96
+def V_MUL_LO_U32 : VOP3_32 <0x00000169, "V_MUL_LO_U32", []>;
+def V_MUL_HI_U32 : VOP3_32 <0x0000016a, "V_MUL_HI_U32", []>;
+def V_MUL_LO_I32 : VOP3_32 <0x0000016b, "V_MUL_LO_I32", []>;
++def : Pat <
++ (mul VSrc_32:$src0, VReg_32:$src1),
++ (V_MUL_LO_I32 VSrc_32:$src0, VReg_32:$src1, (IMPLICIT_DEF), 0, 0, 0, 0)
++>;
+def V_MUL_HI_I32 : VOP3_32 <0x0000016c, "V_MUL_HI_I32", []>;
+def V_DIV_SCALE_F32 : VOP3_32 <0x0000016d, "V_DIV_SCALE_F32", []>;
+def V_DIV_SCALE_F64 : VOP3_64 <0x0000016e, "V_DIV_SCALE_F64", []>;
@@ -20519,13 +22105,20 @@ index 0000000..005be96
+def S_AND_B32 : SOP2_32 <0x0000000e, "S_AND_B32", []>;
+
+def S_AND_B64 : SOP2_64 <0x0000000f, "S_AND_B64",
-+ [(set SReg_64:$dst, (and SReg_64:$src0, SReg_64:$src1))]
++ [(set SReg_64:$dst, (i64 (and SSrc_64:$src0, SSrc_64:$src1)))]
+>;
-+def S_AND_VCC : SOP2_VCC <0x0000000f, "S_AND_B64",
-+ [(set SReg_1:$vcc, (SIvcc_and SReg_64:$src0, SReg_64:$src1))]
++
++def : Pat <
++ (i1 (and SSrc_64:$src0, SSrc_64:$src1)),
++ (S_AND_B64 SSrc_64:$src0, SSrc_64:$src1)
+>;
++
+def S_OR_B32 : SOP2_32 <0x00000010, "S_OR_B32", []>;
+def S_OR_B64 : SOP2_64 <0x00000011, "S_OR_B64", []>;
++def : Pat <
++ (i1 (or SSrc_64:$src0, SSrc_64:$src1)),
++ (S_OR_B64 SSrc_64:$src0, SSrc_64:$src1)
++>;
+def S_XOR_B32 : SOP2_32 <0x00000012, "S_XOR_B32", []>;
+def S_XOR_B64 : SOP2_64 <0x00000013, "S_XOR_B64", []>;
+def S_ANDN2_B32 : SOP2_32 <0x00000014, "S_ANDN2_B32", []>;
@@ -20554,48 +22147,6 @@ index 0000000..005be96
+//def S_CBRANCH_G_FORK : SOP2_ <0x0000002b, "S_CBRANCH_G_FORK", []>;
+def S_ABSDIFF_I32 : SOP2_32 <0x0000002c, "S_ABSDIFF_I32", []>;
+
-+class V_MOV_IMM <Operand immType, SDNode immNode> : InstSI <
-+ (outs VReg_32:$dst),
-+ (ins immType:$src0),
-+ "V_MOV_IMM",
-+ [(set VReg_32:$dst, (immNode:$src0))]
-+>;
-+
-+let isCodeGenOnly = 1, isPseudo = 1 in {
-+
-+def V_MOV_IMM_I32 : V_MOV_IMM<i32imm, imm>;
-+def V_MOV_IMM_F32 : V_MOV_IMM<f32imm, fpimm>;
-+
-+def S_MOV_IMM_I32 : InstSI <
-+ (outs SReg_32:$dst),
-+ (ins i32imm:$src0),
-+ "S_MOV_IMM_I32",
-+ [(set SReg_32:$dst, (imm:$src0))]
-+>;
-+
-+// i64 immediates aren't really supported in hardware, but LLVM will use the i64
-+// type for indices on load and store instructions. The pattern for
-+// S_MOV_IMM_I64 will only match i64 immediates that can fit into 32-bits,
-+// which the hardware can handle.
-+def S_MOV_IMM_I64 : InstSI <
-+ (outs SReg_64:$dst),
-+ (ins i64imm:$src0),
-+ "S_MOV_IMM_I64 $dst, $src0",
-+ [(set SReg_64:$dst, (IMM32bitIn64bit:$src0))]
-+>;
-+
-+} // End isCodeGenOnly, isPseudo = 1
-+
-+class SI_LOAD_LITERAL<Operand ImmType> :
-+ Enc32 <(outs), (ins ImmType:$imm), "LOAD_LITERAL $imm", []> {
-+
-+ bits<32> imm;
-+ let Inst{31-0} = imm;
-+}
-+
-+def SI_LOAD_LITERAL_I32 : SI_LOAD_LITERAL<i32imm>;
-+def SI_LOAD_LITERAL_F32 : SI_LOAD_LITERAL<f32imm>;
-+
+let isCodeGenOnly = 1, isPseudo = 1 in {
+
+def SET_M0 : InstSI <
@@ -20614,13 +22165,6 @@ index 0000000..005be96
+
+let usesCustomInserter = 1 in {
+
-+def SI_V_CNDLT : InstSI <
-+ (outs VReg_32:$dst),
-+ (ins VReg_32:$src0, VReg_32:$src1, VReg_32:$src2),
-+ "SI_V_CNDLT $dst, $src0, $src1, $src2",
-+ [(set VReg_32:$dst, (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, VReg_32:$src2))]
-+>;
-+
+def SI_INTERP : InstSI <
+ (outs VReg_32:$dst),
+ (ins VReg_32:$i, VReg_32:$j, i32imm:$attr_chan, i32imm:$attr, SReg_32:$params),
@@ -20628,21 +22172,6 @@ index 0000000..005be96
+ []
+>;
+
-+def SI_INTERP_CONST : InstSI <
-+ (outs VReg_32:$dst),
-+ (ins i32imm:$attr_chan, i32imm:$attr, SReg_32:$params),
-+ "SI_INTERP_CONST $dst, $attr_chan, $attr, $params",
-+ [(set VReg_32:$dst, (int_SI_fs_interp_constant imm:$attr_chan,
-+ imm:$attr, SReg_32:$params))]
-+>;
-+
-+def SI_KIL : InstSI <
-+ (outs),
-+ (ins VReg_32:$src),
-+ "SI_KIL $src",
-+ [(int_AMDGPU_kill VReg_32:$src)]
-+>;
-+
+def SI_WQM : InstSI <
+ (outs),
+ (ins),
@@ -20662,9 +22191,9 @@ index 0000000..005be96
+
+def SI_IF : InstSI <
+ (outs SReg_64:$dst),
-+ (ins SReg_1:$vcc, brtarget:$target),
++ (ins SReg_64:$vcc, brtarget:$target),
+ "SI_IF",
-+ [(set SReg_64:$dst, (int_SI_if SReg_1:$vcc, bb:$target))]
++ [(set SReg_64:$dst, (int_SI_if SReg_64:$vcc, bb:$target))]
+>;
+
+def SI_ELSE : InstSI <
@@ -20694,9 +22223,9 @@ index 0000000..005be96
+
+def SI_IF_BREAK : InstSI <
+ (outs SReg_64:$dst),
-+ (ins SReg_1:$vcc, SReg_64:$src),
++ (ins SReg_64:$vcc, SReg_64:$src),
+ "SI_IF_BREAK",
-+ [(set SReg_64:$dst, (int_SI_if_break SReg_1:$vcc, SReg_64:$src))]
++ [(set SReg_64:$dst, (int_SI_if_break SReg_64:$vcc, SReg_64:$src))]
+>;
+
+def SI_ELSE_BREAK : InstSI <
@@ -20713,18 +22242,35 @@ index 0000000..005be96
+ [(int_SI_end_cf SReg_64:$saved)]
+>;
+
++def SI_KILL : InstSI <
++ (outs),
++ (ins VReg_32:$src),
++ "SI_KIL $src",
++ [(int_AMDGPU_kill VReg_32:$src)]
++>;
++
+} // end mayLoad = 1, mayStore = 1, hasSideEffects = 1
+ // Uses = [EXEC], Defs = [EXEC]
+
+} // end IsCodeGenOnly, isPseudo
+
++def : Pat<
++ (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, VReg_32:$src2),
++ (V_CNDMASK_B32_e64 VReg_32:$src2, VReg_32:$src1, (V_CMP_GT_F32_e64 0, VReg_32:$src0))
++>;
++
++def : Pat <
++ (int_AMDGPU_kilp),
++ (SI_KILL (V_MOV_B32_e32 0xbf800000))
++>;
++
+/* int_SI_vs_load_input */
+def : Pat<
+ (int_SI_vs_load_input SReg_128:$tlst, IMM12bit:$attr_offset,
+ VReg_32:$buf_idx_vgpr),
+ (BUFFER_LOAD_FORMAT_XYZW imm:$attr_offset, 0, 1, 0, 0, 0,
+ VReg_32:$buf_idx_vgpr, SReg_128:$tlst,
-+ 0, 0, (i32 SREG_LIT_0))
++ 0, 0, 0)
+>;
+
+/* int_SI_export */
@@ -20735,43 +22281,105 @@ index 0000000..005be96
+ VReg_32:$src0, VReg_32:$src1, VReg_32:$src2, VReg_32:$src3)
+>;
+
-+/* int_SI_sample */
++
++/* int_SI_sample for simple 1D texture lookup */
+def : Pat <
-+ (int_SI_sample imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler),
-+ (IMAGE_SAMPLE imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord,
++ (int_SI_sample imm:$writemask, (v1i32 VReg_32:$addr),
++ SReg_256:$rsrc, SReg_128:$sampler, imm),
++ (IMAGE_SAMPLE imm:$writemask, 0, 0, 0, 0, 0, 0, 0,
++ (i32 (COPY_TO_REGCLASS VReg_32:$addr, VReg_32)),
+ SReg_256:$rsrc, SReg_128:$sampler)
+>;
+
-+/* int_SI_sample_lod */
-+def : Pat <
-+ (int_SI_sample_lod imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler),
-+ (IMAGE_SAMPLE_L imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord,
-+ SReg_256:$rsrc, SReg_128:$sampler)
++class SamplePattern<Intrinsic name, MIMG opcode, RegisterClass addr_class,
++ ValueType addr_type> : Pat <
++ (name imm:$writemask, (addr_type addr_class:$addr),
++ SReg_256:$rsrc, SReg_128:$sampler, imm),
++ (opcode imm:$writemask, 0, 0, 0, 0, 0, 0, 0,
++ (EXTRACT_SUBREG addr_class:$addr, sub0),
++ SReg_256:$rsrc, SReg_128:$sampler)
+>;
+
-+/* int_SI_sample_bias */
-+def : Pat <
-+ (int_SI_sample_bias imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler),
-+ (IMAGE_SAMPLE_B imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord,
-+ SReg_256:$rsrc, SReg_128:$sampler)
++class SampleRectPattern<Intrinsic name, MIMG opcode, RegisterClass addr_class,
++ ValueType addr_type> : Pat <
++ (name imm:$writemask, (addr_type addr_class:$addr),
++ SReg_256:$rsrc, SReg_128:$sampler, TEX_RECT),
++ (opcode imm:$writemask, 1, 0, 0, 0, 0, 0, 0,
++ (EXTRACT_SUBREG addr_class:$addr, sub0),
++ SReg_256:$rsrc, SReg_128:$sampler)
++>;
++
++class SampleArrayPattern<Intrinsic name, MIMG opcode, RegisterClass addr_class,
++ ValueType addr_type> : Pat <
++ (name imm:$writemask, (addr_type addr_class:$addr),
++ SReg_256:$rsrc, SReg_128:$sampler, TEX_ARRAY),
++ (opcode imm:$writemask, 0, 0, 1, 0, 0, 0, 0,
++ (EXTRACT_SUBREG addr_class:$addr, sub0),
++ SReg_256:$rsrc, SReg_128:$sampler)
++>;
++
++class SampleShadowPattern<Intrinsic name, MIMG opcode,
++ RegisterClass addr_class, ValueType addr_type> : Pat <
++ (name imm:$writemask, (addr_type addr_class:$addr),
++ SReg_256:$rsrc, SReg_128:$sampler, TEX_SHADOW),
++ (opcode imm:$writemask, 0, 0, 0, 0, 0, 0, 0,
++ (EXTRACT_SUBREG addr_class:$addr, sub0),
++ SReg_256:$rsrc, SReg_128:$sampler)
++>;
++
++class SampleShadowArrayPattern<Intrinsic name, MIMG opcode,
++ RegisterClass addr_class, ValueType addr_type> : Pat <
++ (name imm:$writemask, (addr_type addr_class:$addr),
++ SReg_256:$rsrc, SReg_128:$sampler, TEX_SHADOW_ARRAY),
++ (opcode imm:$writemask, 0, 0, 1, 0, 0, 0, 0,
++ (EXTRACT_SUBREG addr_class:$addr, sub0),
++ SReg_256:$rsrc, SReg_128:$sampler)
+>;
+
++/* int_SI_sample* for texture lookups consuming more address parameters */
++multiclass SamplePatterns<RegisterClass addr_class, ValueType addr_type> {
++ def : SamplePattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>;
++ def : SampleRectPattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>;
++ def : SampleArrayPattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>;
++ def : SampleShadowPattern <int_SI_sample, IMAGE_SAMPLE_C, addr_class, addr_type>;
++ def : SampleShadowArrayPattern <int_SI_sample, IMAGE_SAMPLE_C, addr_class, addr_type>;
++
++ def : SamplePattern <int_SI_samplel, IMAGE_SAMPLE_L, addr_class, addr_type>;
++ def : SampleArrayPattern <int_SI_samplel, IMAGE_SAMPLE_L, addr_class, addr_type>;
++ def : SampleShadowPattern <int_SI_samplel, IMAGE_SAMPLE_C_L, addr_class, addr_type>;
++ def : SampleShadowArrayPattern <int_SI_samplel, IMAGE_SAMPLE_C_L, addr_class, addr_type>;
++
++ def : SamplePattern <int_SI_sampleb, IMAGE_SAMPLE_B, addr_class, addr_type>;
++ def : SampleArrayPattern <int_SI_sampleb, IMAGE_SAMPLE_B, addr_class, addr_type>;
++ def : SampleShadowPattern <int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_class, addr_type>;
++ def : SampleShadowArrayPattern <int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_class, addr_type>;
++}
++
++defm : SamplePatterns<VReg_64, v2i32>;
++defm : SamplePatterns<VReg_128, v4i32>;
++defm : SamplePatterns<VReg_256, v8i32>;
++defm : SamplePatterns<VReg_512, v16i32>;
++
+def CLAMP_SI : CLAMP<VReg_32>;
+def FABS_SI : FABS<VReg_32>;
+def FNEG_SI : FNEG<VReg_32>;
+
-+def : Extract_Element <f32, v4f32, VReg_128, 0, sel_x>;
-+def : Extract_Element <f32, v4f32, VReg_128, 1, sel_y>;
-+def : Extract_Element <f32, v4f32, VReg_128, 2, sel_z>;
-+def : Extract_Element <f32, v4f32, VReg_128, 3, sel_w>;
++def : Extract_Element <f32, v4f32, VReg_128, 0, sub0>;
++def : Extract_Element <f32, v4f32, VReg_128, 1, sub1>;
++def : Extract_Element <f32, v4f32, VReg_128, 2, sub2>;
++def : Extract_Element <f32, v4f32, VReg_128, 3, sub3>;
+
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 4, sel_x>;
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 5, sel_y>;
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 6, sel_z>;
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 7, sel_w>;
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 4, sub0>;
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 5, sub1>;
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 6, sub2>;
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 7, sub3>;
+
++def : Vector1_Build <v1i32, VReg_32, i32, VReg_32>;
++def : Vector2_Build <v2i32, VReg_64, i32, VReg_32>;
+def : Vector_Build <v4f32, VReg_128, f32, VReg_32>;
-+def : Vector_Build <v4i32, SReg_128, i32, SReg_32>;
++def : Vector_Build <v4i32, VReg_128, i32, VReg_32>;
++def : Vector8_Build <v8i32, VReg_256, i32, VReg_32>;
++def : Vector16_Build <v16i32, VReg_512, i32, VReg_32>;
+
+def : BitConvert <i32, f32, SReg_32>;
+def : BitConvert <i32, f32, VReg_32>;
@@ -20779,24 +22387,46 @@ index 0000000..005be96
+def : BitConvert <f32, i32, SReg_32>;
+def : BitConvert <f32, i32, VReg_32>;
+
++/********** ================== **********/
++/********** Immediate Patterns **********/
++/********** ================== **********/
++
++def : Pat <
++ (i1 imm:$imm),
++ (S_MOV_B64 imm:$imm)
++>;
++
++def : Pat <
++ (i32 imm:$imm),
++ (V_MOV_B32_e32 imm:$imm)
++>;
++
++def : Pat <
++ (f32 fpimm:$imm),
++ (V_MOV_B32_e32 fpimm:$imm)
++>;
++
+def : Pat <
-+ (i64 (SIsreg1_bitcast SReg_1:$vcc)),
-+ (S_MOV_B64 (COPY_TO_REGCLASS SReg_1:$vcc, SReg_64))
++ (i32 imm:$imm),
++ (S_MOV_B32 imm:$imm)
+>;
+
+def : Pat <
-+ (i1 (SIsreg1_bitcast SReg_64:$vcc)),
-+ (COPY_TO_REGCLASS SReg_64:$vcc, SReg_1)
++ (f32 fpimm:$imm),
++ (S_MOV_B32 fpimm:$imm)
+>;
+
+def : Pat <
-+ (i64 (SIvcc_bitcast VCCReg:$vcc)),
-+ (S_MOV_B64 (COPY_TO_REGCLASS VCCReg:$vcc, SReg_64))
++ (i64 InlineImm<i64>:$imm),
++ (S_MOV_B64 InlineImm<i64>:$imm)
+>;
+
++// i64 immediates aren't supported in hardware, split it into two 32bit values
+def : Pat <
-+ (i1 (SIvcc_bitcast SReg_64:$vcc)),
-+ (COPY_TO_REGCLASS SReg_64:$vcc, VCCReg)
++ (i64 imm:$imm),
++ (INSERT_SUBREG (INSERT_SUBREG (i64 (IMPLICIT_DEF)),
++ (S_MOV_B32 (i32 (LO32 imm:$imm))), sub0),
++ (S_MOV_B32 (i32 (HI32 imm:$imm))), sub1)
+>;
+
+/********** ===================== **********/
@@ -20804,6 +22434,12 @@ index 0000000..005be96
+/********** ===================== **********/
+
+def : Pat <
++ (int_SI_fs_interp_constant imm:$attr_chan, imm:$attr, SReg_32:$params),
++ (V_INTERP_MOV_F32 INTERP.P0, imm:$attr_chan, imm:$attr,
++ (S_MOV_B32 SReg_32:$params))
++>;
++
++def : Pat <
+ (int_SI_fs_interp_linear_center imm:$attr_chan, imm:$attr, SReg_32:$params),
+ (SI_INTERP (f32 LINEAR_CENTER_I), (f32 LINEAR_CENTER_J), imm:$attr_chan,
+ imm:$attr, SReg_32:$params)
@@ -20861,56 +22497,95 @@ index 0000000..005be96
+def : POW_Common <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_F32_e32, VReg_32>;
+
+def : Pat <
-+ (int_AMDGPU_div AllReg_32:$src0, AllReg_32:$src1),
-+ (V_MUL_LEGACY_F32_e32 AllReg_32:$src0, (V_RCP_LEGACY_F32_e32 AllReg_32:$src1))
++ (int_AMDGPU_div VSrc_32:$src0, VSrc_32:$src1),
++ (V_MUL_LEGACY_F32_e32 VSrc_32:$src0, (V_RCP_LEGACY_F32_e32 VSrc_32:$src1))
+>;
+
+def : Pat<
-+ (fdiv AllReg_32:$src0, AllReg_32:$src1),
-+ (V_MUL_F32_e32 AllReg_32:$src0, (V_RCP_F32_e32 AllReg_32:$src1))
++ (fdiv VSrc_32:$src0, VSrc_32:$src1),
++ (V_MUL_F32_e32 VSrc_32:$src0, (V_RCP_F32_e32 VSrc_32:$src1))
+>;
+
+def : Pat <
-+ (int_AMDGPU_kilp),
-+ (SI_KIL (V_MOV_IMM_I32 0xbf800000))
++ (fcos VSrc_32:$src0),
++ (V_COS_F32_e32 (V_MUL_F32_e32 VSrc_32:$src0, (V_MOV_B32_e32 CONST.TWO_PI_INV)))
++>;
++
++def : Pat <
++ (fsin VSrc_32:$src0),
++ (V_SIN_F32_e32 (V_MUL_F32_e32 VSrc_32:$src0, (V_MOV_B32_e32 CONST.TWO_PI_INV)))
+>;
+
+def : Pat <
+ (int_AMDGPU_cube VReg_128:$src),
+ (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)),
-+ (V_CUBETC_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
-+ (EXTRACT_SUBREG VReg_128:$src, sel_y),
-+ (EXTRACT_SUBREG VReg_128:$src, sel_z),
-+ 0, 0, 0, 0), sel_x),
-+ (V_CUBESC_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
-+ (EXTRACT_SUBREG VReg_128:$src, sel_y),
-+ (EXTRACT_SUBREG VReg_128:$src, sel_z),
-+ 0, 0, 0, 0), sel_y),
-+ (V_CUBEMA_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
-+ (EXTRACT_SUBREG VReg_128:$src, sel_y),
-+ (EXTRACT_SUBREG VReg_128:$src, sel_z),
-+ 0, 0, 0, 0), sel_z),
-+ (V_CUBEID_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
-+ (EXTRACT_SUBREG VReg_128:$src, sel_y),
-+ (EXTRACT_SUBREG VReg_128:$src, sel_z),
-+ 0, 0, 0, 0), sel_w)
++ (V_CUBETC_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
++ (EXTRACT_SUBREG VReg_128:$src, sub1),
++ (EXTRACT_SUBREG VReg_128:$src, sub2),
++ 0, 0, 0, 0), sub0),
++ (V_CUBESC_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
++ (EXTRACT_SUBREG VReg_128:$src, sub1),
++ (EXTRACT_SUBREG VReg_128:$src, sub2),
++ 0, 0, 0, 0), sub1),
++ (V_CUBEMA_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
++ (EXTRACT_SUBREG VReg_128:$src, sub1),
++ (EXTRACT_SUBREG VReg_128:$src, sub2),
++ 0, 0, 0, 0), sub2),
++ (V_CUBEID_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
++ (EXTRACT_SUBREG VReg_128:$src, sub1),
++ (EXTRACT_SUBREG VReg_128:$src, sub2),
++ 0, 0, 0, 0), sub3)
++>;
++
++def : Pat <
++ (i32 (sext (i1 SReg_64:$src0))),
++ (V_CNDMASK_B32_e64 (i32 0), (i32 -1), SReg_64:$src0)
+>;
+
+/********** ================== **********/
+/********** VOP3 Patterns **********/
+/********** ================== **********/
+
-+def : Pat <(f32 (IL_mad AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2)),
-+ (V_MAD_LEGACY_F32 AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2,
++def : Pat <(f32 (IL_mad VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2)),
++ (V_MAD_LEGACY_F32 VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2,
+ 0, 0, 0, 0)>;
+
++/********** ================== **********/
++/********** SMRD Patterns **********/
++/********** ================== **********/
++
++multiclass SMRD_Pattern <SMRD Instr_IMM, SMRD Instr_SGPR, ValueType vt> {
++ // 1. Offset as 8bit DWORD immediate
++ def : Pat <
++ (constant_load (SIadd64bit32bit SReg_64:$sbase, IMM8bitDWORD:$offset)),
++ (vt (Instr_IMM SReg_64:$sbase, IMM8bitDWORD:$offset))
++ >;
++
++ // 2. Offset loaded in an 32bit SGPR
++ def : Pat <
++ (constant_load (SIadd64bit32bit SReg_64:$sbase, imm:$offset)),
++ (vt (Instr_SGPR SReg_64:$sbase, (S_MOV_B32 imm:$offset)))
++ >;
++
++ // 3. No offset at all
++ def : Pat <
++ (constant_load SReg_64:$sbase),
++ (vt (Instr_IMM SReg_64:$sbase, 0))
++ >;
++}
++
++defm : SMRD_Pattern <S_LOAD_DWORD_IMM, S_LOAD_DWORD_SGPR, f32>;
++defm : SMRD_Pattern <S_LOAD_DWORD_IMM, S_LOAD_DWORD_SGPR, i32>;
++defm : SMRD_Pattern <S_LOAD_DWORDX4_IMM, S_LOAD_DWORDX4_SGPR, v4i32>;
++defm : SMRD_Pattern <S_LOAD_DWORDX8_IMM, S_LOAD_DWORDX8_SGPR, v8i32>;
++
+} // End isSI predicate
diff --git a/lib/Target/R600/SIIntrinsics.td b/lib/Target/R600/SIIntrinsics.td
new file mode 100644
-index 0000000..c322fef
+index 0000000..611b9c4
--- /dev/null
+++ b/lib/Target/R600/SIIntrinsics.td
-@@ -0,0 +1,52 @@
+@@ -0,0 +1,54 @@
+//===-- SIIntrinsics.td - SI Intrinsic defs ----------------*- tablegen -*-===//
+//
+// The LLVM Compiler Infrastructure
@@ -20935,9 +22610,11 @@ index 0000000..c322fef
+ def int_SI_vs_load_input : Intrinsic <[llvm_v4f32_ty], [llvm_v4i32_ty, llvm_i16_ty, llvm_i32_ty], [IntrReadMem]> ;
+ def int_SI_wqm : Intrinsic <[], [], []>;
+
-+ def int_SI_sample : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>;
-+ def int_SI_sample_bias : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>;
-+ def int_SI_sample_lod : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>;
++ class Sample : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_anyvector_ty, llvm_v8i32_ty, llvm_v4i32_ty, llvm_i32_ty], [IntrReadMem]>;
++
++ def int_SI_sample : Sample;
++ def int_SI_sampleb : Sample;
++ def int_SI_samplel : Sample;
+
+ /* Interpolation Intrinsics */
+
@@ -20965,10 +22642,10 @@ index 0000000..c322fef
+}
diff --git a/lib/Target/R600/SILowerControlFlow.cpp b/lib/Target/R600/SILowerControlFlow.cpp
new file mode 100644
-index 0000000..3fbe653
+index 0000000..2007d30
--- /dev/null
+++ b/lib/Target/R600/SILowerControlFlow.cpp
-@@ -0,0 +1,331 @@
+@@ -0,0 +1,372 @@
+//===-- SILowerControlFlow.cpp - Use predicates for control flow ----------===//
+//
+// The LLVM Compiler Infrastructure
@@ -21039,7 +22716,10 @@ index 0000000..3fbe653
+ static char ID;
+ const TargetInstrInfo *TII;
+
-+ void Skip(MachineInstr &MI, MachineOperand &To);
++ bool shouldSkip(MachineBasicBlock *From, MachineBasicBlock *To);
++
++ void Skip(MachineInstr &From, MachineOperand &To);
++ void SkipIfDead(MachineInstr &MI);
+
+ void If(MachineInstr &MI);
+ void Else(MachineInstr &MI);
@@ -21049,6 +22729,7 @@ index 0000000..3fbe653
+ void Loop(MachineInstr &MI);
+ void EndCf(MachineInstr &MI);
+
++ void Kill(MachineInstr &MI);
+ void Branch(MachineInstr &MI);
+
+public:
@@ -21071,22 +22752,29 @@ index 0000000..3fbe653
+ return new SILowerControlFlowPass(tm);
+}
+
-+void SILowerControlFlowPass::Skip(MachineInstr &From, MachineOperand &To) {
++bool SILowerControlFlowPass::shouldSkip(MachineBasicBlock *From,
++ MachineBasicBlock *To) {
++
+ unsigned NumInstr = 0;
+
-+ for (MachineBasicBlock *MBB = *From.getParent()->succ_begin();
-+ NumInstr < SkipThreshold && MBB != To.getMBB() && !MBB->succ_empty();
++ for (MachineBasicBlock *MBB = From; MBB != To && !MBB->succ_empty();
+ MBB = *MBB->succ_begin()) {
+
+ for (MachineBasicBlock::iterator I = MBB->begin(), E = MBB->end();
+ NumInstr < SkipThreshold && I != E; ++I) {
+
+ if (I->isBundle() || !I->isBundled())
-+ ++NumInstr;
++ if (++NumInstr >= SkipThreshold)
++ return true;
+ }
+ }
+
-+ if (NumInstr < SkipThreshold)
++ return false;
++}
++
++void SILowerControlFlowPass::Skip(MachineInstr &From, MachineOperand &To) {
++
++ if (!shouldSkip(*From.getParent()->succ_begin(), To.getMBB()))
+ return;
+
+ DebugLoc DL = From.getDebugLoc();
@@ -21095,6 +22783,38 @@ index 0000000..3fbe653
+ .addReg(AMDGPU::EXEC);
+}
+
++void SILowerControlFlowPass::SkipIfDead(MachineInstr &MI) {
++
++ MachineBasicBlock &MBB = *MI.getParent();
++ DebugLoc DL = MI.getDebugLoc();
++
++ if (!shouldSkip(&MBB, &MBB.getParent()->back()))
++ return;
++
++ MachineBasicBlock::iterator Insert = &MI;
++ ++Insert;
++
++ // If the exec mask is non-zero, skip the next two instructions
++ BuildMI(MBB, Insert, DL, TII->get(AMDGPU::S_CBRANCH_EXECNZ))
++ .addImm(3)
++ .addReg(AMDGPU::EXEC);
++
++ // Exec mask is zero: Export to NULL target...
++ BuildMI(MBB, Insert, DL, TII->get(AMDGPU::EXP))
++ .addImm(0)
++ .addImm(0x09) // V_008DFC_SQ_EXP_NULL
++ .addImm(0)
++ .addImm(1)
++ .addImm(1)
++ .addReg(AMDGPU::VGPR0)
++ .addReg(AMDGPU::VGPR0)
++ .addReg(AMDGPU::VGPR0)
++ .addReg(AMDGPU::VGPR0);
++
++ // ... and terminate wavefront
++ BuildMI(MBB, Insert, DL, TII->get(AMDGPU::S_ENDPGM));
++}
++
+void SILowerControlFlowPass::If(MachineInstr &MI) {
+ MachineBasicBlock &MBB = *MI.getParent();
+ DebugLoc DL = MI.getDebugLoc();
@@ -21213,8 +22933,28 @@ index 0000000..3fbe653
+ assert(0);
+}
+
++void SILowerControlFlowPass::Kill(MachineInstr &MI) {
++
++ MachineBasicBlock &MBB = *MI.getParent();
++ DebugLoc DL = MI.getDebugLoc();
++
++ // Kill is only allowed in pixel shaders
++ MachineFunction &MF = *MBB.getParent();
++ SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
++ assert(Info->ShaderType == ShaderType::PIXEL);
++
++ // Clear this pixel from the exec mask if the operand is negative
++ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::V_CMPX_LE_F32_e32), AMDGPU::VCC)
++ .addImm(0)
++ .addOperand(MI.getOperand(0));
++
++ MI.eraseFromParent();
++}
++
+bool SILowerControlFlowPass::runOnMachineFunction(MachineFunction &MF) {
-+ bool HaveCf = false;
++
++ bool HaveKill = false;
++ unsigned Depth = 0;
+
+ for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
+ BI != BE; ++BI) {
@@ -21228,6 +22968,7 @@ index 0000000..3fbe653
+ switch (MI.getOpcode()) {
+ default: break;
+ case AMDGPU::SI_IF:
++ ++Depth;
+ If(MI);
+ break;
+
@@ -21248,171 +22989,34 @@ index 0000000..3fbe653
+ break;
+
+ case AMDGPU::SI_LOOP:
++ ++Depth;
+ Loop(MI);
+ break;
+
-+ case AMDGPU::SI_END_CF:
-+ HaveCf = true;
-+ EndCf(MI);
-+ break;
-+
-+ case AMDGPU::S_BRANCH:
-+ Branch(MI);
-+ break;
-+ }
-+ }
-+ }
-+
-+ // TODO: What is this good for?
-+ unsigned ShaderType = MF.getInfo<SIMachineFunctionInfo>()->ShaderType;
-+ if (HaveCf && ShaderType == ShaderType::PIXEL) {
-+ for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
-+ BI != BE; ++BI) {
-+
-+ MachineBasicBlock &MBB = *BI;
-+ if (MBB.succ_empty()) {
-+
-+ MachineInstr &MI = *MBB.getFirstNonPHI();
-+ DebugLoc DL = MI.getDebugLoc();
-+
-+ // If the exec mask is non-zero, skip the next two instructions
-+ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_CBRANCH_EXECNZ))
-+ .addImm(3)
-+ .addReg(AMDGPU::EXEC);
-+
-+ // Exec mask is zero: Export to NULL target...
-+ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::EXP))
-+ .addImm(0)
-+ .addImm(0x09) // V_008DFC_SQ_EXP_NULL
-+ .addImm(0)
-+ .addImm(1)
-+ .addImm(1)
-+ .addReg(AMDGPU::SREG_LIT_0)
-+ .addReg(AMDGPU::SREG_LIT_0)
-+ .addReg(AMDGPU::SREG_LIT_0)
-+ .addReg(AMDGPU::SREG_LIT_0);
-+
-+ // ... and terminate wavefront
-+ BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_ENDPGM));
-+ }
-+ }
-+ }
-+
-+ return true;
-+}
-diff --git a/lib/Target/R600/SILowerLiteralConstants.cpp b/lib/Target/R600/SILowerLiteralConstants.cpp
-new file mode 100644
-index 0000000..c0411e9
---- /dev/null
-+++ b/lib/Target/R600/SILowerLiteralConstants.cpp
-@@ -0,0 +1,108 @@
-+//===-- SILowerLiteralConstants.cpp - Lower intrs using literal constants--===//
-+//
-+// The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//===----------------------------------------------------------------------===//
-+//
-+/// \file
-+/// \brief This pass performs the following transformation on instructions with
-+/// literal constants:
-+///
-+/// %VGPR0 = V_MOV_IMM_I32 1
-+///
-+/// becomes:
-+///
-+/// BUNDLE
-+/// * %VGPR = V_MOV_B32_32 SI_LITERAL_CONSTANT
-+/// * SI_LOAD_LITERAL 1
-+///
-+/// The resulting sequence matches exactly how the hardware handles immediate
-+/// operands, so this transformation greatly simplifies the code generator.
-+///
-+/// Only the *_MOV_IMM_* support immediate operands at the moment, but when
-+/// support for immediate operands is added to other instructions, they
-+/// will be lowered here as well.
-+//===----------------------------------------------------------------------===//
-+
-+#include "AMDGPU.h"
-+#include "llvm/CodeGen/MachineFunction.h"
-+#include "llvm/CodeGen/MachineFunctionPass.h"
-+#include "llvm/CodeGen/MachineInstrBuilder.h"
-+#include "llvm/CodeGen/MachineInstrBundle.h"
-+
-+using namespace llvm;
-+
-+namespace {
-+
-+class SILowerLiteralConstantsPass : public MachineFunctionPass {
-+
-+private:
-+ static char ID;
-+ const TargetInstrInfo *TII;
-+
-+public:
-+ SILowerLiteralConstantsPass(TargetMachine &tm) :
-+ MachineFunctionPass(ID), TII(tm.getInstrInfo()) { }
-+
-+ virtual bool runOnMachineFunction(MachineFunction &MF);
-+
-+ const char *getPassName() const {
-+ return "SI Lower literal constants pass";
-+ }
-+};
-+
-+} // End anonymous namespace
-+
-+char SILowerLiteralConstantsPass::ID = 0;
-+
-+FunctionPass *llvm::createSILowerLiteralConstantsPass(TargetMachine &tm) {
-+ return new SILowerLiteralConstantsPass(tm);
-+}
-+
-+bool SILowerLiteralConstantsPass::runOnMachineFunction(MachineFunction &MF) {
-+ for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
-+ BB != BB_E; ++BB) {
-+ MachineBasicBlock &MBB = *BB;
-+ for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I);
-+ I != MBB.end(); I = Next) {
-+ Next = llvm::next(I);
-+ MachineInstr &MI = *I;
-+ switch (MI.getOpcode()) {
-+ default: break;
-+ case AMDGPU::S_MOV_IMM_I32:
-+ case AMDGPU::S_MOV_IMM_I64:
-+ case AMDGPU::V_MOV_IMM_F32:
-+ case AMDGPU::V_MOV_IMM_I32: {
-+ unsigned MovOpcode;
-+ unsigned LoadLiteralOpcode;
-+ MachineOperand LiteralOp = MI.getOperand(1);
-+ if (AMDGPU::VReg_32RegClass.contains(MI.getOperand(0).getReg())) {
-+ MovOpcode = AMDGPU::V_MOV_B32_e32;
-+ } else {
-+ MovOpcode = AMDGPU::S_MOV_B32;
-+ }
-+ if (LiteralOp.isImm()) {
-+ LoadLiteralOpcode = AMDGPU::SI_LOAD_LITERAL_I32;
-+ } else {
-+ LoadLiteralOpcode = AMDGPU::SI_LOAD_LITERAL_F32;
++ case AMDGPU::SI_END_CF:
++ if (--Depth == 0 && HaveKill) {
++ SkipIfDead(MI);
++ HaveKill = false;
+ }
-+ MIBundleBuilder Bundle(MBB, I);
-+ Bundle
-+ .append(BuildMI(MF, MBB.findDebugLoc(I), TII->get(MovOpcode),
-+ MI.getOperand(0).getReg())
-+ .addReg(AMDGPU::SI_LITERAL_CONSTANT))
-+ .append(BuildMI(MF, MBB.findDebugLoc(I),
-+ TII->get(LoadLiteralOpcode))
-+ .addOperand(MI.getOperand(1)));
-+ llvm::finalizeBundle(MBB, Bundle.begin());
-+ MI.eraseFromParent();
++ EndCf(MI);
++ break;
++
++ case AMDGPU::SI_KILL:
++ if (Depth == 0)
++ SkipIfDead(MI);
++ else
++ HaveKill = true;
++ Kill(MI);
++ break;
++
++ case AMDGPU::S_BRANCH:
++ Branch(MI);
+ break;
-+ }
+ }
+ }
+ }
-+ return false;
++
++ return true;
+}
diff --git a/lib/Target/R600/SIMachineFunctionInfo.cpp b/lib/Target/R600/SIMachineFunctionInfo.cpp
new file mode 100644
@@ -21589,24 +23193,10 @@ index 0000000..40171e4
+#endif // SIREGISTERINFO_H_
diff --git a/lib/Target/R600/SIRegisterInfo.td b/lib/Target/R600/SIRegisterInfo.td
new file mode 100644
-index 0000000..c3f1361
+index 0000000..ab36b87
--- /dev/null
+++ b/lib/Target/R600/SIRegisterInfo.td
-@@ -0,0 +1,167 @@
-+
-+let Namespace = "AMDGPU" in {
-+ def low : SubRegIndex;
-+ def high : SubRegIndex;
-+
-+ def sub0 : SubRegIndex;
-+ def sub1 : SubRegIndex;
-+ def sub2 : SubRegIndex;
-+ def sub3 : SubRegIndex;
-+ def sub4 : SubRegIndex;
-+ def sub5 : SubRegIndex;
-+ def sub6 : SubRegIndex;
-+ def sub7 : SubRegIndex;
-+}
+@@ -0,0 +1,190 @@
+
+class SIReg <string n, bits<16> encoding = 0> : Register<n> {
+ let Namespace = "AMDGPU";
@@ -21615,13 +23205,15 @@ index 0000000..c3f1361
+
+class SI_64 <string n, list<Register> subregs, bits<16> encoding> : RegisterWithSubRegs<n, subregs> {
+ let Namespace = "AMDGPU";
-+ let SubRegIndices = [low, high];
++ let SubRegIndices = [sub0, sub1];
+ let HWEncoding = encoding;
+}
+
+class SGPR_32 <bits<16> num, string name> : SIReg<name, num>;
+
-+class VGPR_32 <bits<16> num, string name> : SIReg<name, num>;
++class VGPR_32 <bits<16> num, string name> : SIReg<name, num> {
++ let HWEncoding{8} = 1;
++}
+
+// Special Registers
+def VCC : SIReg<"VCC", 106>;
@@ -21629,8 +23221,6 @@ index 0000000..c3f1361
+def EXEC_HI : SIReg <"EXEC HI", 127>;
+def EXEC : SI_64<"EXEC", [EXEC_LO, EXEC_HI], 126>;
+def SCC : SIReg<"SCC", 253>;
-+def SREG_LIT_0 : SIReg <"S LIT 0", 128>;
-+def SI_LITERAL_CONSTANT : SIReg<"LITERAL CONSTANT", 255>;
+def M0 : SIReg <"M0", 124>;
+
+//Interpolation registers
@@ -21668,12 +23258,12 @@ index 0000000..c3f1361
+ (add (sequence "SGPR%u", 0, 101))>;
+
+// SGPR 64-bit registers
-+def SGPR_64 : RegisterTuples<[low, high],
++def SGPR_64 : RegisterTuples<[sub0, sub1],
+ [(add (decimate SGPR_32, 2)),
+ (add(decimate (rotl SGPR_32, 1), 2))]>;
+
+// SGPR 128-bit registers
-+def SGPR_128 : RegisterTuples<[sel_x, sel_y, sel_z, sel_w],
++def SGPR_128 : RegisterTuples<[sub0, sub1, sub2, sub3],
+ [(add (decimate SGPR_32, 4)),
+ (add (decimate (rotl SGPR_32, 1), 4)),
+ (add (decimate (rotl SGPR_32, 2), 4)),
@@ -21699,32 +23289,61 @@ index 0000000..c3f1361
+ (add (sequence "VGPR%u", 0, 255))>;
+
+// VGPR 64-bit registers
-+def VGPR_64 : RegisterTuples<[low, high],
++def VGPR_64 : RegisterTuples<[sub0, sub1],
+ [(add VGPR_32),
+ (add (rotl VGPR_32, 1))]>;
+
+// VGPR 128-bit registers
-+def VGPR_128 : RegisterTuples<[sel_x, sel_y, sel_z, sel_w],
++def VGPR_128 : RegisterTuples<[sub0, sub1, sub2, sub3],
+ [(add VGPR_32),
+ (add (rotl VGPR_32, 1)),
+ (add (rotl VGPR_32, 2)),
+ (add (rotl VGPR_32, 3))]>;
+
++// VGPR 256-bit registers
++def VGPR_256 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7],
++ [(add VGPR_32),
++ (add (rotl VGPR_32, 1)),
++ (add (rotl VGPR_32, 2)),
++ (add (rotl VGPR_32, 3)),
++ (add (rotl VGPR_32, 4)),
++ (add (rotl VGPR_32, 5)),
++ (add (rotl VGPR_32, 6)),
++ (add (rotl VGPR_32, 7))]>;
++
++// VGPR 512-bit registers
++def VGPR_512 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7,
++ sub8, sub9, sub10, sub11, sub12, sub13, sub14, sub15],
++ [(add VGPR_32),
++ (add (rotl VGPR_32, 1)),
++ (add (rotl VGPR_32, 2)),
++ (add (rotl VGPR_32, 3)),
++ (add (rotl VGPR_32, 4)),
++ (add (rotl VGPR_32, 5)),
++ (add (rotl VGPR_32, 6)),
++ (add (rotl VGPR_32, 7)),
++ (add (rotl VGPR_32, 8)),
++ (add (rotl VGPR_32, 9)),
++ (add (rotl VGPR_32, 10)),
++ (add (rotl VGPR_32, 11)),
++ (add (rotl VGPR_32, 12)),
++ (add (rotl VGPR_32, 13)),
++ (add (rotl VGPR_32, 14)),
++ (add (rotl VGPR_32, 15))]>;
++
+// Register class for all scalar registers (SGPRs + Special Registers)
+def SReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32,
-+ (add SGPR_32, SREG_LIT_0, M0, EXEC_LO, EXEC_HI)
++ (add SGPR_32, M0, EXEC_LO, EXEC_HI)
+>;
+
-+def SReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add SGPR_64, VCC, EXEC)>;
-+
-+def SReg_1 : RegisterClass<"AMDGPU", [i1], 1, (add VCC, SGPR_64, EXEC)>;
++def SReg_64 : RegisterClass<"AMDGPU", [i1, i64], 64, (add SGPR_64, VCC, EXEC)>;
+
+def SReg_128 : RegisterClass<"AMDGPU", [v4f32, v4i32], 128, (add SGPR_128)>;
+
+def SReg_256 : RegisterClass<"AMDGPU", [v8i32], 256, (add SGPR_256)>;
+
+// Register class for all vector registers (VGPRs + Interploation Registers)
-+def VReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32,
++def VReg_32 : RegisterClass<"AMDGPU", [f32, i32, v1i32], 32,
+ (add VGPR_32,
+ PERSP_SAMPLE_I, PERSP_SAMPLE_J,
+ PERSP_CENTER_I, PERSP_CENTER_J,
@@ -21745,14 +23364,22 @@ index 0000000..c3f1361
+ )
+>;
+
-+def VReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add VGPR_64)>;
++def VReg_64 : RegisterClass<"AMDGPU", [i64, v2i32], 64, (add VGPR_64)>;
++
++def VReg_128 : RegisterClass<"AMDGPU", [v4f32, v4i32], 128, (add VGPR_128)>;
++
++def VReg_256 : RegisterClass<"AMDGPU", [v8i32], 256, (add VGPR_256)>;
+
-+def VReg_128 : RegisterClass<"AMDGPU", [v4f32], 128, (add VGPR_128)>;
++def VReg_512 : RegisterClass<"AMDGPU", [v16i32], 512, (add VGPR_512)>;
+
-+// AllReg_* - A set of all scalar and vector registers of a given width.
-+def AllReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32, (add VReg_32, SReg_32)>;
++// [SV]Src_* operands can have either an immediate or an register
++def SSrc_32 : RegisterClass<"AMDGPU", [i32, f32], 32, (add SReg_32)>;
+
-+def AllReg_64 : RegisterClass<"AMDGPU", [f64, i64], 64, (add SReg_64, VReg_64)>;
++def SSrc_64 : RegisterClass<"AMDGPU", [i1, i64], 64, (add SReg_64)>;
++
++def VSrc_32 : RegisterClass<"AMDGPU", [i32, f32], 32, (add VReg_32, SReg_32)>;
++
++def VSrc_64 : RegisterClass<"AMDGPU", [i64], 64, (add SReg_64, VReg_64)>;
+
+// Special register classes for predicates and the M0 register
+def SCCReg : RegisterClass<"AMDGPU", [i1], 1, (add SCC)>;
@@ -21876,6 +23503,30 @@ index 0000000..b8ac4e7
+CPPFLAGS = -I$(PROJ_OBJ_DIR)/.. -I$(PROJ_SRC_DIR)/..
+
+include $(LEVEL)/Makefile.common
+diff --git a/test/CodeGen/R600/128bit-kernel-args.ll b/test/CodeGen/R600/128bit-kernel-args.ll
+new file mode 100644
+index 0000000..114f9e7
+--- /dev/null
++++ b/test/CodeGen/R600/128bit-kernel-args.ll
+@@ -0,0 +1,18 @@
++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; CHECK: @v4i32_kernel_arg
++; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
++
++define void @v4i32_kernel_arg(<4 x i32> addrspace(1)* %out, <4 x i32> %in) {
++entry:
++ store <4 x i32> %in, <4 x i32> addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @v4f32_kernel_arg
++; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
++define void @v4f32_kernel_args(<4 x float> addrspace(1)* %out, <4 x float> %in) {
++entry:
++ store <4 x float> %in, <4 x float> addrspace(1)* %out
++ ret void
++}
diff --git a/test/CodeGen/R600/add.v4i32.ll b/test/CodeGen/R600/add.v4i32.ll
new file mode 100644
index 0000000..ac4a874
@@ -21918,6 +23569,82 @@ index 0000000..662085e
+ store <4 x i32> %result, <4 x i32> addrspace(1)* %out
+ ret void
+}
+diff --git a/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll b/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll
+new file mode 100644
+index 0000000..fd958b3
+--- /dev/null
++++ b/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll
+@@ -0,0 +1,36 @@
++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; This test is for a bug in
++; DAGCombiner::reduceBuildVecConvertToConvertBuildVec() where
++; the wrong type was being passed to
++; TargetLowering::getOperationAction() when checking the legality of
++; ISD::UINT_TO_FP and ISD::SINT_TO_FP opcodes.
++
++
++; CHECK: @sint
++; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
++
++define void @sint(<4 x float> addrspace(1)* %out, i32 addrspace(1)* %in) {
++entry:
++ %ptr = getelementptr i32 addrspace(1)* %in, i32 1
++ %sint = load i32 addrspace(1) * %in
++ %conv = sitofp i32 %sint to float
++ %0 = insertelement <4 x float> undef, float %conv, i32 0
++ %splat = shufflevector <4 x float> %0, <4 x float> undef, <4 x i32> zeroinitializer
++ store <4 x float> %splat, <4 x float> addrspace(1)* %out
++ ret void
++}
++
++;CHECK: @uint
++;CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
++
++define void @uint(<4 x float> addrspace(1)* %out, i32 addrspace(1)* %in) {
++entry:
++ %ptr = getelementptr i32 addrspace(1)* %in, i32 1
++ %uint = load i32 addrspace(1) * %in
++ %conv = uitofp i32 %uint to float
++ %0 = insertelement <4 x float> undef, float %conv, i32 0
++ %splat = shufflevector <4 x float> %0, <4 x float> undef, <4 x i32> zeroinitializer
++ store <4 x float> %splat, <4 x float> addrspace(1)* %out
++ ret void
++}
+diff --git a/test/CodeGen/R600/disconnected-predset-break-bug.ll b/test/CodeGen/R600/disconnected-predset-break-bug.ll
+new file mode 100644
+index 0000000..a586742
+--- /dev/null
++++ b/test/CodeGen/R600/disconnected-predset-break-bug.ll
+@@ -0,0 +1,28 @@
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; PRED_SET* instructions must be tied to any instruction that uses their
++; result. This tests that there are no instructions between the PRED_SET*
++; and the PREDICATE_BREAK in this loop.
++
++; CHECK: @loop_ge
++; CHECK: WHILE
++; CHECK: PRED_SET
++; CHECK-NEXT: PREDICATED_BREAK
++define void @loop_ge(i32 addrspace(1)* nocapture %out, i32 %iterations) nounwind {
++entry:
++ %cmp5 = icmp sgt i32 %iterations, 0
++ br i1 %cmp5, label %for.body, label %for.end
++
++for.body: ; preds = %for.body, %entry
++ %i.07.in = phi i32 [ %i.07, %for.body ], [ %iterations, %entry ]
++ %ai.06 = phi i32 [ %add, %for.body ], [ 0, %entry ]
++ %i.07 = add nsw i32 %i.07.in, -1
++ %arrayidx = getelementptr inbounds i32 addrspace(1)* %out, i32 %ai.06
++ store i32 %i.07, i32 addrspace(1)* %arrayidx, align 4
++ %add = add nsw i32 %ai.06, 1
++ %exitcond = icmp eq i32 %add, %iterations
++ br i1 %exitcond, label %for.end, label %for.body
++
++for.end: ; preds = %for.body, %entry
++ ret void
++}
diff --git a/test/CodeGen/R600/fabs.ll b/test/CodeGen/R600/fabs.ll
new file mode 100644
index 0000000..0407533
@@ -22027,15 +23754,13 @@ index 0000000..5c981ef
+}
diff --git a/test/CodeGen/R600/fcmp.ll b/test/CodeGen/R600/fcmp.ll
new file mode 100644
-index 0000000..1dcd07c
+index 0000000..89f5e9e
--- /dev/null
+++ b/test/CodeGen/R600/fcmp.ll
-@@ -0,0 +1,16 @@
+@@ -0,0 +1,14 @@
+;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
+
-+;CHECK: SETE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-+;CHECK: MOV T{{[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}}
-+;CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
++;CHECK: SETE_DX10 T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+
+define void @test(i32 addrspace(1)* %out, float addrspace(1)* %in) {
+entry:
@@ -22183,14 +23908,13 @@ index 0000000..6d44a0c
+}
diff --git a/test/CodeGen/R600/fsub.ll b/test/CodeGen/R600/fsub.ll
new file mode 100644
-index 0000000..0ec1c37
+index 0000000..591aa52
--- /dev/null
+++ b/test/CodeGen/R600/fsub.ll
-@@ -0,0 +1,17 @@
+@@ -0,0 +1,16 @@
+;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
+
-+; CHECK: MOV T{{[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}}
-+; CHECK: ADD T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
++; CHECK: ADD T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}}
+
+define void @test() {
+ %r0 = call float @llvm.R600.load.input(i32 0)
@@ -22266,6 +23990,64 @@ index 0000000..aad44d9
+ store i32 %value, i32 addrspace(1)* %out
+ ret void
+}
+diff --git a/test/CodeGen/R600/kcache-fold.ll b/test/CodeGen/R600/kcache-fold.ll
+new file mode 100644
+index 0000000..382f78c
+--- /dev/null
++++ b/test/CodeGen/R600/kcache-fold.ll
+@@ -0,0 +1,52 @@
++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; CHECK: MOV T{{[0-9]+\.[XYZW], CBuf0\[[0-9]+\]\.[XYZW]}}
++
++define void @main() {
++main_body:
++ %0 = load <4 x float> addrspace(9)* null
++ %1 = extractelement <4 x float> %0, i32 0
++ %2 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
++ %3 = extractelement <4 x float> %2, i32 0
++ %4 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
++ %5 = extractelement <4 x float> %4, i32 0
++ %6 = fcmp ult float %1, 0.000000e+00
++ %7 = select i1 %6, float %3, float %5
++ %8 = load <4 x float> addrspace(9)* null
++ %9 = extractelement <4 x float> %8, i32 1
++ %10 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
++ %11 = extractelement <4 x float> %10, i32 1
++ %12 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
++ %13 = extractelement <4 x float> %12, i32 1
++ %14 = fcmp ult float %9, 0.000000e+00
++ %15 = select i1 %14, float %11, float %13
++ %16 = load <4 x float> addrspace(9)* null
++ %17 = extractelement <4 x float> %16, i32 2
++ %18 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
++ %19 = extractelement <4 x float> %18, i32 2
++ %20 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
++ %21 = extractelement <4 x float> %20, i32 2
++ %22 = fcmp ult float %17, 0.000000e+00
++ %23 = select i1 %22, float %19, float %21
++ %24 = load <4 x float> addrspace(9)* null
++ %25 = extractelement <4 x float> %24, i32 3
++ %26 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
++ %27 = extractelement <4 x float> %26, i32 3
++ %28 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
++ %29 = extractelement <4 x float> %28, i32 3
++ %30 = fcmp ult float %25, 0.000000e+00
++ %31 = select i1 %30, float %27, float %29
++ %32 = call float @llvm.AMDIL.clamp.(float %7, float 0.000000e+00, float 1.000000e+00)
++ %33 = call float @llvm.AMDIL.clamp.(float %15, float 0.000000e+00, float 1.000000e+00)
++ %34 = call float @llvm.AMDIL.clamp.(float %23, float 0.000000e+00, float 1.000000e+00)
++ %35 = call float @llvm.AMDIL.clamp.(float %31, float 0.000000e+00, float 1.000000e+00)
++ %36 = insertelement <4 x float> undef, float %32, i32 0
++ %37 = insertelement <4 x float> %36, float %33, i32 1
++ %38 = insertelement <4 x float> %37, float %34, i32 2
++ %39 = insertelement <4 x float> %38, float %35, i32 3
++ call void @llvm.R600.store.swizzle(<4 x float> %39, i32 0, i32 0)
++ ret void
++}
++
++declare float @llvm.AMDIL.clamp.(float, float, float) readnone
++declare void @llvm.R600.store.swizzle(<4 x float>, i32, i32)
diff --git a/test/CodeGen/R600/lit.local.cfg b/test/CodeGen/R600/lit.local.cfg
new file mode 100644
index 0000000..36ee493
@@ -22287,10 +24069,10 @@ index 0000000..36ee493
+
diff --git a/test/CodeGen/R600/literals.ll b/test/CodeGen/R600/literals.ll
new file mode 100644
-index 0000000..4c731b2
+index 0000000..be62342
--- /dev/null
+++ b/test/CodeGen/R600/literals.ll
-@@ -0,0 +1,30 @@
+@@ -0,0 +1,32 @@
+; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
+
+; Test using an integer literal constant.
@@ -22299,6 +24081,7 @@ index 0000000..4c731b2
+; or
+; ADD_INT literal.x REG, 5
+
++; CHECK; @i32_literal
+; CHECK: ADD_INT {{[A-Z0-9,. ]*}}literal.x,{{[A-Z0-9,. ]*}} 5
+define void @i32_literal(i32 addrspace(1)* %out, i32 %in) {
+entry:
@@ -22313,6 +24096,7 @@ index 0000000..4c731b2
+; or
+; ADD literal.x REG, 5.0
+
++; CHECK: @float_literal
+; CHECK: ADD {{[A-Z0-9,. ]*}}literal.x,{{[A-Z0-9,. ]*}} {{[0-9]+}}(5.0
+define void @float_literal(float addrspace(1)* %out, float %in) {
+entry:
@@ -22366,6 +24150,35 @@ index 0000000..fac957f
+declare void @llvm.AMDGPU.store.output(float, i32)
+
+declare float @llvm.AMDGPU.trunc(float ) readnone
+diff --git a/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll b/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll
+new file mode 100644
+index 0000000..0c19f14
+--- /dev/null
++++ b/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll
+@@ -0,0 +1,23 @@
++;RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s
++
++;CHECK: S_MOV_B32
++;CHECK-NEXT: V_INTERP_MOV_F32
++
++define void @main() {
++main_body:
++ call void @llvm.AMDGPU.shader.type(i32 0)
++ %0 = load i32 addrspace(8)* inttoptr (i32 6 to i32 addrspace(8)*)
++ %1 = call float @llvm.SI.fs.interp.constant(i32 0, i32 0, i32 %0)
++ %2 = call i32 @llvm.SI.packf16(float %1, float %1)
++ %3 = bitcast i32 %2 to float
++ call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %3, float %3, float %3, float %3)
++ ret void
++}
++
++declare void @llvm.AMDGPU.shader.type(i32)
++
++declare float @llvm.SI.fs.interp.constant(i32, i32, i32) readonly
++
++declare i32 @llvm.SI.packf16(float, float) readnone
++
++declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)
diff --git a/test/CodeGen/R600/llvm.cos.ll b/test/CodeGen/R600/llvm.cos.ll
new file mode 100644
index 0000000..dc120bf
@@ -22466,6 +24279,112 @@ index 0000000..b070dcd
+ store i32 %2, i32 addrspace(1)* %out
+ ret void
+}
+diff --git a/test/CodeGen/R600/predicates.ll b/test/CodeGen/R600/predicates.ll
+new file mode 100644
+index 0000000..18895a4
+--- /dev/null
++++ b/test/CodeGen/R600/predicates.ll
+@@ -0,0 +1,100 @@
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; These tests make sure the compiler is optimizing branches using predicates
++; when it is legal to do so.
++
++; CHECK: @simple_if
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
++; CHECK: LSHL T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++define void @simple_if(i32 addrspace(1)* %out, i32 %in) {
++entry:
++ %0 = icmp sgt i32 %in, 0
++ br i1 %0, label %IF, label %ENDIF
++
++IF:
++ %1 = shl i32 %in, 1
++ br label %ENDIF
++
++ENDIF:
++ %2 = phi i32 [ %in, %entry ], [ %1, %IF ]
++ store i32 %2, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @simple_if_else
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++define void @simple_if_else(i32 addrspace(1)* %out, i32 %in) {
++entry:
++ %0 = icmp sgt i32 %in, 0
++ br i1 %0, label %IF, label %ELSE
++
++IF:
++ %1 = shl i32 %in, 1
++ br label %ENDIF
++
++ELSE:
++ %2 = lshr i32 %in, 1
++ br label %ENDIF
++
++ENDIF:
++ %3 = phi i32 [ %1, %IF ], [ %2, %ELSE ]
++ store i32 %3, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @nested_if
++; CHECK: IF_PREDICATE_SET
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
++; CHECK: LSHL T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++; CHECK: ENDIF
++define void @nested_if(i32 addrspace(1)* %out, i32 %in) {
++entry:
++ %0 = icmp sgt i32 %in, 0
++ br i1 %0, label %IF0, label %ENDIF
++
++IF0:
++ %1 = add i32 %in, 10
++ %2 = icmp sgt i32 %1, 0
++ br i1 %2, label %IF1, label %ENDIF
++
++IF1:
++ %3 = shl i32 %1, 1
++ br label %ENDIF
++
++ENDIF:
++ %4 = phi i32 [%in, %entry], [%1, %IF0], [%3, %IF1]
++ store i32 %4, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @nested_if_else
++; CHECK: IF_PREDICATE_SET
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++; CHECK: ENDIF
++define void @nested_if_else(i32 addrspace(1)* %out, i32 %in) {
++entry:
++ %0 = icmp sgt i32 %in, 0
++ br i1 %0, label %IF0, label %ENDIF
++
++IF0:
++ %1 = add i32 %in, 10
++ %2 = icmp sgt i32 %1, 0
++ br i1 %2, label %IF1, label %ELSE1
++
++IF1:
++ %3 = shl i32 %1, 1
++ br label %ENDIF
++
++ELSE1:
++ %4 = lshr i32 %in, 1
++ br label %ENDIF
++
++ENDIF:
++ %5 = phi i32 [%in, %entry], [%3, %IF1], [%4, %ELSE1]
++ store i32 %5, i32 addrspace(1)* %out
++ ret void
++}
diff --git a/test/CodeGen/R600/reciprocal.ll b/test/CodeGen/R600/reciprocal.ll
new file mode 100644
index 0000000..6838c1a
@@ -22517,7 +24436,7 @@ index 0000000..3556fac
+}
diff --git a/test/CodeGen/R600/selectcc-icmp-select-float.ll b/test/CodeGen/R600/selectcc-icmp-select-float.ll
new file mode 100644
-index 0000000..f65a300
+index 0000000..359ca1e
--- /dev/null
+++ b/test/CodeGen/R600/selectcc-icmp-select-float.ll
@@ -0,0 +1,15 @@
@@ -22525,7 +24444,7 @@ index 0000000..f65a300
+
+; Note additional optimizations may cause this SGT to be replaced with a
+; CND* instruction.
-+; CHECK: SGT_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal.x, -1}}
++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal.x, -1}}
+; Test a selectcc with i32 LHS/RHS and float True/False
+
+define void @test(float addrspace(1)* %out, i32 addrspace(1)* %in) {
@@ -22570,6 +24489,149 @@ index 0000000..b38078e
+ store i32 %3, i32 addrspace(1)* %out
+ ret void
+}
+diff --git a/test/CodeGen/R600/set-dx10.ll b/test/CodeGen/R600/set-dx10.ll
+new file mode 100644
+index 0000000..54febcf
+--- /dev/null
++++ b/test/CodeGen/R600/set-dx10.ll
+@@ -0,0 +1,137 @@
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; These tests check that floating point comparisons which are used by select
++; to store integer true (-1) and false (0) values are lowered to one of the
++; SET*DX10 instructions.
++
++; CHECK: @fcmp_une_select_fptosi
++; CHECK: SETNE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_une_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp une float %in, 5.0
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++ %2 = fsub float -0.000000e+00, %1
++ %3 = fptosi float %2 to i32
++ store i32 %3, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @fcmp_une_select_i32
++; CHECK: SETNE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_une_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp une float %in, 5.0
++ %1 = select i1 %0, i32 -1, i32 0
++ store i32 %1, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @fcmp_ueq_select_fptosi
++; CHECK: SETE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_ueq_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp ueq float %in, 5.0
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++ %2 = fsub float -0.000000e+00, %1
++ %3 = fptosi float %2 to i32
++ store i32 %3, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @fcmp_ueq_select_i32
++; CHECK: SETE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_ueq_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp ueq float %in, 5.0
++ %1 = select i1 %0, i32 -1, i32 0
++ store i32 %1, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @fcmp_ugt_select_fptosi
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_ugt_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp ugt float %in, 5.0
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++ %2 = fsub float -0.000000e+00, %1
++ %3 = fptosi float %2 to i32
++ store i32 %3, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @fcmp_ugt_select_i32
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_ugt_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp ugt float %in, 5.0
++ %1 = select i1 %0, i32 -1, i32 0
++ store i32 %1, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @fcmp_uge_select_fptosi
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_uge_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp uge float %in, 5.0
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++ %2 = fsub float -0.000000e+00, %1
++ %3 = fptosi float %2 to i32
++ store i32 %3, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @fcmp_uge_select_i32
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_uge_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp uge float %in, 5.0
++ %1 = select i1 %0, i32 -1, i32 0
++ store i32 %1, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @fcmp_ule_select_fptosi
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @fcmp_ule_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp ule float %in, 5.0
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++ %2 = fsub float -0.000000e+00, %1
++ %3 = fptosi float %2 to i32
++ store i32 %3, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @fcmp_ule_select_i32
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @fcmp_ule_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp ule float %in, 5.0
++ %1 = select i1 %0, i32 -1, i32 0
++ store i32 %1, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @fcmp_ult_select_fptosi
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @fcmp_ult_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp ult float %in, 5.0
++ %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++ %2 = fsub float -0.000000e+00, %1
++ %3 = fptosi float %2 to i32
++ store i32 %3, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @fcmp_ult_select_i32
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @fcmp_ult_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp ult float %in, 5.0
++ %1 = select i1 %0, i32 -1, i32 0
++ store i32 %1, i32 addrspace(1)* %out
++ ret void
++}
diff --git a/test/CodeGen/R600/setcc.v4i32.ll b/test/CodeGen/R600/setcc.v4i32.ll
new file mode 100644
index 0000000..0752f2e
@@ -22590,12 +24652,13 @@ index 0000000..0752f2e
+}
diff --git a/test/CodeGen/R600/short-args.ll b/test/CodeGen/R600/short-args.ll
new file mode 100644
-index 0000000..1070250
+index 0000000..b69e327
--- /dev/null
+++ b/test/CodeGen/R600/short-args.ll
-@@ -0,0 +1,37 @@
+@@ -0,0 +1,41 @@
+; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
+
++; CHECK: @i8_arg
+; CHECK: VTX_READ_8 T{{[0-9]+\.X, T[0-9]+\.X}}
+
+define void @i8_arg(i32 addrspace(1)* nocapture %out, i8 %in) nounwind {
@@ -22605,6 +24668,7 @@ index 0000000..1070250
+ ret void
+}
+
++; CHECK: @i8_zext_arg
+; CHECK: VTX_READ_8 T{{[0-9]+\.X, T[0-9]+\.X}}
+
+define void @i8_zext_arg(i32 addrspace(1)* nocapture %out, i8 zeroext %in) nounwind {
@@ -22614,6 +24678,7 @@ index 0000000..1070250
+ ret void
+}
+
++; CHECK: @i16_arg
+; CHECK: VTX_READ_16 T{{[0-9]+\.X, T[0-9]+\.X}}
+
+define void @i16_arg(i32 addrspace(1)* nocapture %out, i16 %in) nounwind {
@@ -22623,6 +24688,7 @@ index 0000000..1070250
+ ret void
+}
+
++; CHECK: @i16_zext_arg
+; CHECK: VTX_READ_16 T{{[0-9]+\.X, T[0-9]+\.X}}
+
+define void @i16_zext_arg(i32 addrspace(1)* nocapture %out, i16 zeroext %in) nounwind {
@@ -22682,6 +24748,95 @@ index 0000000..47657a6
+ store <4 x i32> %result, <4 x i32> addrspace(1)* %out
+ ret void
+}
+diff --git a/test/CodeGen/R600/unsupported-cc.ll b/test/CodeGen/R600/unsupported-cc.ll
+new file mode 100644
+index 0000000..b48c591
+--- /dev/null
++++ b/test/CodeGen/R600/unsupported-cc.ll
+@@ -0,0 +1,83 @@
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; These tests are for condition codes that are not supported by the hardware
++
++; CHECK: @slt
++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 5(7.006492e-45)
++define void @slt(i32 addrspace(1)* %out, i32 %in) {
++entry:
++ %0 = icmp slt i32 %in, 5
++ %1 = select i1 %0, i32 -1, i32 0
++ store i32 %1, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @ult_i32
++; CHECK: SETGT_UINT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 5(7.006492e-45)
++define void @ult_i32(i32 addrspace(1)* %out, i32 %in) {
++entry:
++ %0 = icmp ult i32 %in, 5
++ %1 = select i1 %0, i32 -1, i32 0
++ store i32 %1, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @ult_float
++; CHECK: SETGT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @ult_float(float addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp ult float %in, 5.0
++ %1 = select i1 %0, float 1.0, float 0.0
++ store float %1, float addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @olt
++; CHECK: SETGT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @olt(float addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp olt float %in, 5.0
++ %1 = select i1 %0, float 1.0, float 0.0
++ store float %1, float addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @sle
++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 6(8.407791e-45)
++define void @sle(i32 addrspace(1)* %out, i32 %in) {
++entry:
++ %0 = icmp sle i32 %in, 5
++ %1 = select i1 %0, i32 -1, i32 0
++ store i32 %1, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @ule_i32
++; CHECK: SETGT_UINT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 6(8.407791e-45)
++define void @ule_i32(i32 addrspace(1)* %out, i32 %in) {
++entry:
++ %0 = icmp ule i32 %in, 5
++ %1 = select i1 %0, i32 -1, i32 0
++ store i32 %1, i32 addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @ule_float
++; CHECK: SETGE T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @ule_float(float addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp ule float %in, 5.0
++ %1 = select i1 %0, float 1.0, float 0.0
++ store float %1, float addrspace(1)* %out
++ ret void
++}
++
++; CHECK: @ole
++; CHECK: SETGE T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @ole(float addrspace(1)* %out, float %in) {
++entry:
++ %0 = fcmp ole float %in, 5.0
++ %1 = select i1 %0, float 1.0, float 0.0
++ store float %1, float addrspace(1)* %out
++ ret void
++}
diff --git a/test/CodeGen/R600/urem.v4i32.ll b/test/CodeGen/R600/urem.v4i32.ll
new file mode 100644
index 0000000..2e7388c
@@ -22705,15 +24860,13 @@ index 0000000..2e7388c
+}
diff --git a/test/CodeGen/R600/vec4-expand.ll b/test/CodeGen/R600/vec4-expand.ll
new file mode 100644
-index 0000000..47cbf82
+index 0000000..8f62bc6
--- /dev/null
+++ b/test/CodeGen/R600/vec4-expand.ll
-@@ -0,0 +1,52 @@
-+; There are bugs in the DAGCombiner that prevent this test from passing.
-+; XFAIL: *
-+
+@@ -0,0 +1,53 @@
+; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
+
++; CHECK: @fp_to_sint
+; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
@@ -22726,6 +24879,7 @@ index 0000000..47cbf82
+ ret void
+}
+
++; CHECK: @fp_to_uint
+; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
@@ -22738,6 +24892,7 @@ index 0000000..47cbf82
+ ret void
+}
+
++; CHECK: @sint_to_fp
+; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
@@ -22750,6 +24905,7 @@ index 0000000..47cbf82
+ ret void
+}
+
++; CHECK: @uint_to_fp
+; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
@@ -22804,6 +24960,15 @@ index 0000000..62cdcf5
+declare <4 x float> @llvm.SI.vs.load.input(<4 x i32>, i32, i32)
+
+declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)
---
-1.8.0.2
-
+diff --git a/test/CodeGen/X86/cvtv2f32.ll b/test/CodeGen/X86/cvtv2f32.ll
+index 466b096..d11bb9e 100644
+--- a/test/CodeGen/X86/cvtv2f32.ll
++++ b/test/CodeGen/X86/cvtv2f32.ll
+@@ -1,3 +1,7 @@
++; A bug fix in the DAGCombiner made this test fail, so marking as xfail
++; until this can be investigated further.
++; XFAIL: *
++
+ ; RUN: llc < %s -mtriple=i686-linux-pc -mcpu=corei7 | FileCheck %s
+
+ define <2 x float> @foo(i32 %x, i32 %y, <2 x float> %v) {
diff --git a/sys-devel/llvm/llvm-3.2.ebuild b/sys-devel/llvm/llvm-3.2.ebuild
index 7171bfc..ceb16bb 100644
--- a/sys-devel/llvm/llvm-3.2.ebuild
+++ b/sys-devel/llvm/llvm-3.2.ebuild
@@ -1,33 +1,38 @@
-# Copyright 1999-2012 Gentoo Foundation
+# Copyright 1999-2013 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
-# $Header: /var/cvsroot/gentoo-x86/sys-devel/llvm/llvm-3.2.ebuild,v 1.1 2012/12/21 09:18:12 voyageur Exp $
+# $Header: /var/cvsroot/gentoo-x86/sys-devel/llvm/llvm-3.2.ebuild,v 1.6 2013/02/27 06:02:15 zmedico Exp $
EAPI=5
-PYTHON_DEPEND="2"
-inherit eutils flag-o-matic multilib toolchain-funcs python pax-utils
+
+# pypy gives me around 1700 unresolved tests due to open file limit
+# being exceeded. probably GC does not close them fast enough.
+PYTHON_COMPAT=( python{2_5,2_6,2_7} )
+
+inherit eutils flag-o-matic multilib python-any-r1 toolchain-funcs pax-utils
DESCRIPTION="Low Level Virtual Machine"
HOMEPAGE="http://llvm.org/"
-SRC_URI="http://llvm.org/releases/${PV}/${P}.src.tar.gz"
+SRC_URI="http://llvm.org/releases/${PV}/${P}.src.tar.gz
+ !doc? ( http://dev.gentoo.org/~voyageur/distfiles/${P}-manpages.tar.bz2 )"
LICENSE="UoI-NCSA"
SLOT="0"
-KEYWORDS="~amd64 ~arm ~ppc ~x86 ~amd64-fbsd ~x86-fbsd ~x64-freebsd ~amd64-linux ~x86-linux ~ppc-macos ~x64-macos"
+KEYWORDS="~amd64 ~arm ~ppc ~x86 ~amd64-fbsd ~x86-fbsd ~x64-freebsd ~amd64-linux ~arm-linux ~x86-linux ~ppc-macos ~x64-macos"
IUSE="debug doc gold +libffi multitarget ocaml test udis86 vim-syntax"
DEPEND="dev-lang/perl
- dev-python/sphinx
>=sys-devel/make-3.79
>=sys-devel/flex-2.5.4
>=sys-devel/bison-1.875d
|| ( >=sys-devel/gcc-3.0 >=sys-devel/gcc-apple-4.2.1 )
|| ( >=sys-devel/binutils-2.18 >=sys-devel/binutils-apple-3.2.3 )
+ doc? ( dev-python/sphinx )
gold? ( >=sys-devel/binutils-2.22[cxx] )
libffi? ( virtual/pkgconfig
virtual/libffi )
ocaml? ( dev-lang/ocaml )
- udis86? ( amd64? ( dev-libs/udis86[pic] )
- !amd64? ( dev-libs/udis86 ) )"
+ udis86? ( dev-libs/udis86[pic(+)] )
+ ${PYTHON_DEPS}"
RDEPEND="dev-lang/perl
libffi? ( virtual/libffi )
vim-syntax? ( || ( app-editors/vim app-editors/gvim ) )"
@@ -36,8 +41,7 @@ S=${WORKDIR}/${P}.src
pkg_setup() {
# Required for test and build
- python_set_active_version 2
- python_pkg_setup
+ python-any-r1_pkg_setup
# need to check if the active compiler is ok
@@ -64,12 +68,12 @@ pkg_setup() {
if [[ ${CHOST} == x86_64-* && ${broken_gcc_amd64} == *" ${version} "* ]];
then
- elog "Your version of gcc is known to miscompile llvm in amd64"
- elog "architectures. Check"
- elog "http://www.llvm.org/docs/GettingStarted.html for possible"
- elog "solutions."
+ elog "Your version of gcc is known to miscompile llvm in amd64"
+ elog "architectures. Check"
+ elog "http://www.llvm.org/docs/GettingStarted.html for possible"
+ elog "solutions."
die "Your currently active version of gcc is known to miscompile llvm"
- fi
+ fi
}
src_prepare() {
@@ -96,12 +100,9 @@ src_prepare() {
sed -e "/NO_INSTALL = 1/s/^/#/" -i utils/FileCheck/Makefile \
|| die "FileCheck Makefile sed failed"
- # Specify python version
- python_convert_shebangs -r 2 test/Scripts
-
epatch "${FILESDIR}"/${PN}-3.2-nodoctargz.patch
epatch "${FILESDIR}"/${PN}-3.0-PPC_macro.patch
- epatch "${FILESDIR}"/0001-Add-R600-backend.patch
+ epatch "${FILESDIR}"/R600-Mesa-9.1.patch
# User patches
epatch_user
@@ -150,20 +151,28 @@ src_configure() {
src_compile() {
emake VERBOSE=1 KEEP_SYMBOLS=1 REQUIRES_RTTI=1
- emake -C docs -f Makefile.sphinx man
- use doc && emake -C docs -f Makefile.sphinx html
+ if use doc; then
+ emake -C docs -f Makefile.sphinx man html
+ fi
+ # emake -C docs -f Makefile.sphinx html
pax-mark m Release/bin/lli
if use test; then
pax-mark m unittests/ExecutionEngine/JIT/Release/JITTests
+ pax-mark m unittests/ExecutionEngine/MCJIT/Release/MCJITTests
+ pax-mark m unittests/Support/Release/SupportTests
fi
}
src_install() {
emake KEEP_SYMBOLS=1 DESTDIR="${D}" install
- doman docs/_build/man/*.1
- use doc && dohtml -r docs/_build/html/
+ if use doc; then
+ doman docs/_build/man/*.1
+ dohtml -r docs/_build/html/
+ else
+ doman "${WORKDIR}"/${P}-manpages/*.1
+ fi
if use vim-syntax; then
insinto /usr/share/vim/vimfiles/syntax
diff --git a/sys-devel/llvm/metadata.xml b/sys-devel/llvm/metadata.xml
index e5a362b..38e16d8 100644
--- a/sys-devel/llvm/metadata.xml
+++ b/sys-devel/llvm/metadata.xml
@@ -16,7 +16,6 @@
4. LLVM does not imply things that you would expect from a high-level virtual machine. It does not require garbage collection or run-time code generation (In fact, LLVM makes a great static compiler!). Note that optional LLVM components can be used to build high-level virtual machines and other systems that need these services.</longdescription>
<use>
<flag name='gold'>Build the gold linker plugin</flag>
- <flag name='llvm-gcc'>Build LLVM with <pkg>sys-devel/llvm-gcc</pkg></flag>
<flag name='multitarget'>Build all host targets (default: host only)</flag>
<flag name='udis86'>Enable support for <pkg>dev-libs/udis86</pkg> disassembler library</flag>
</use>
next reply other threads:[~2013-03-05 5:38 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-05 5:38 Alexey Shvetsov [this message]
-- strict thread matches above, loose matches on Subject: below --
2013-04-06 10:35 [gentoo-commits] proj/x11:opencl commit in: sys-devel/llvm/files/, sys-devel/llvm/ Alexey Shvetsov
2013-03-05 6:28 Alexey Shvetsov
2012-12-22 21:09 Alexey Shvetsov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1362461694.9597d3a8e121c0e0961de5642b1f550700dc8496.alexxy@gentoo \
--to=alexxy@gentoo.org \
--cc=gentoo-commits@lists.gentoo.org \
--cc=gentoo-dev@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox