public inbox for gentoo-commits@lists.gentoo.org
 help / color / mirror / Atom feed
From: "Alexey Shvetsov" <alexxy@gentoo.org>
To: gentoo-commits@lists.gentoo.org
Subject: [gentoo-commits] proj/x11:opencl commit in: sys-devel/llvm/files/, sys-devel/llvm/
Date: Tue,  5 Mar 2013 05:38:17 +0000 (UTC)	[thread overview]
Message-ID: <1362461694.9597d3a8e121c0e0961de5642b1f550700dc8496.alexxy@gentoo> (raw)

commit:     9597d3a8e121c0e0961de5642b1f550700dc8496
Author:     Alexey Shvetsov <alexxy <AT> gentoo <DOT> org>
AuthorDate: Tue Mar  5 05:34:54 2013 +0000
Commit:     Alexey Shvetsov <alexxy <AT> gentoo <DOT> org>
CommitDate: Tue Mar  5 05:34:54 2013 +0000
URL:        http://git.overlays.gentoo.org/gitweb/?p=proj/x11.git;a=commit;h=9597d3a8

Update llvm R600 patch

Package-Manager: portage-2.2.0_alpha166
RepoMan-Options: --force

---
 ...-Add-R600-backend.patch => R600-Mesa-9.1.patch} | 7809 +++++++++++++-------
 sys-devel/llvm/llvm-3.2.ebuild                     |   57 +-
 sys-devel/llvm/metadata.xml                        |    1 -
 3 files changed, 5020 insertions(+), 2847 deletions(-)

diff --git a/sys-devel/llvm/files/0001-Add-R600-backend.patch b/sys-devel/llvm/files/R600-Mesa-9.1.patch
similarity index 81%
rename from sys-devel/llvm/files/0001-Add-R600-backend.patch
rename to sys-devel/llvm/files/R600-Mesa-9.1.patch
index 4ebe499..9b9e1f5 100644
--- a/sys-devel/llvm/files/0001-Add-R600-backend.patch
+++ b/sys-devel/llvm/files/R600-Mesa-9.1.patch
@@ -1,517 +1,46 @@
-From 07d146158af424e4c0aa85a3de49516d97affbb9 Mon Sep 17 00:00:00 2001
-From: Tom Stellard <thomas.stellard@amd.com>
-Date: Tue, 11 Dec 2012 21:25:42 +0000
-Subject: [PATCH] Add R600 backend
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-A new backend supporting AMD GPUs: Radeon HD2XXX - HD7XXX
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169915 91177308-0d34-0410-b5e6-96231b3b80d8
-
-Conflicts:
-	lib/Target/LLVMBuild.txt
-
-[CMake] Fixup R600.
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169962 91177308-0d34-0410-b5e6-96231b3b80d8
-
-Avoid setIsInsideBundle in Target/R600.
-
-This function is going to be removed.
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170064 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: remove nonsense setPrefLoopAlignment
-
-The Align parameter is a power of two, so 16 results in 64K
-alignment. Additional to that even 16 byte alignment doesn't
-make any sense, so just remove it.
-
-Patch by: Christian König
-
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170341 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: BB operand support for SI
-
-Patch by: Christian König
-
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170342 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: enable S_*N2_* instructions
-
-They seem to work fine.
-
-Patch by: Christian König
-
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170343 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: New control flow for SI v2
-
-This patch replaces the control flow handling with a new
-pass which structurize the graph before transforming it to
-machine instruction. This has a couple of different advantages
-and currently fixes 20 piglit tests without a single regression.
-
-It is now a general purpose transformation that could be not
-only be used for SI/R6xx, but also for other hardware
-implementations that use a form of structurized control flow.
-
-v2: further cleanup, fixes and documentation
-
-Patch by: Christian König
-
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170591 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: control flow optimization
-
-Branch if we have enough instructions so that it makes sense.
-Also remove branches if they don't make sense.
-
-Patch by: Christian König
-
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170592 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Remove unecessary VREG alignment.
-
-Unlike SGPRs VGPRs doesn't need to be aligned.
-
-Patch by: Christian König
-
-Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
-Signed-off-by: Christian König <deathsimple@vodafone.de>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170593 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Add entry in CODE_OWNERS.TXT
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170594 91177308-0d34-0410-b5e6-96231b3b80d8
-
-Conflicts:
-	CODE_OWNERS.TXT
-
-Target/R600: Update MIB according to r170588.
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170620 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Expand vec4 INT <-> FP conversions
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170901 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Add SHADOWCUBE to TEX_SHADOW pattern
-
-Patch by: Vadim Girlin
-
-Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170921 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Fix MAX_UINT definition
-
-Patch by: Vadim Girlin
-
-Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170922 91177308-0d34-0410-b5e6-96231b3b80d8
-
-R600: Coding style - remove empty spaces from the beginning of functions
-
-No functionality change.
-
-git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170923 91177308-0d34-0410-b5e6-96231b3b80d8
----
- CODE_OWNERS.TXT                                    |   14 +
- include/llvm/Intrinsics.td                         |    1 +
- include/llvm/IntrinsicsR600.td                     |   36 +
- lib/Target/LLVMBuild.txt                           |    2 +-
- lib/Target/R600/AMDGPU.h                           |   49 +
- lib/Target/R600/AMDGPU.td                          |   40 +
- lib/Target/R600/AMDGPUAsmPrinter.cpp               |  138 +
- lib/Target/R600/AMDGPUAsmPrinter.h                 |   44 +
- lib/Target/R600/AMDGPUCodeEmitter.h                |   49 +
- lib/Target/R600/AMDGPUConvertToISA.cpp             |   62 +
- lib/Target/R600/AMDGPUISelLowering.cpp             |  417 +++
- lib/Target/R600/AMDGPUISelLowering.h               |  144 +
- lib/Target/R600/AMDGPUInstrInfo.cpp                |  257 ++
- lib/Target/R600/AMDGPUInstrInfo.h                  |  149 +
- lib/Target/R600/AMDGPUInstrInfo.td                 |   74 +
- lib/Target/R600/AMDGPUInstructions.td              |  190 ++
- lib/Target/R600/AMDGPUIntrinsics.td                |   62 +
- lib/Target/R600/AMDGPUMCInstLower.cpp              |   83 +
- lib/Target/R600/AMDGPUMCInstLower.h                |   34 +
- lib/Target/R600/AMDGPURegisterInfo.cpp             |   51 +
- lib/Target/R600/AMDGPURegisterInfo.h               |   63 +
- lib/Target/R600/AMDGPURegisterInfo.td              |   22 +
- lib/Target/R600/AMDGPUStructurizeCFG.cpp           |  714 +++++
- lib/Target/R600/AMDGPUSubtarget.cpp                |   87 +
- lib/Target/R600/AMDGPUSubtarget.h                  |   65 +
- lib/Target/R600/AMDGPUTargetMachine.cpp            |  142 +
- lib/Target/R600/AMDGPUTargetMachine.h              |   70 +
- lib/Target/R600/AMDIL.h                            |  106 +
- lib/Target/R600/AMDIL7XXDevice.cpp                 |  115 +
- lib/Target/R600/AMDIL7XXDevice.h                   |   72 +
- lib/Target/R600/AMDILBase.td                       |   85 +
- lib/Target/R600/AMDILCFGStructurizer.cpp           | 3049 ++++++++++++++++++++
- lib/Target/R600/AMDILDevice.cpp                    |  124 +
- lib/Target/R600/AMDILDevice.h                      |  117 +
- lib/Target/R600/AMDILDeviceInfo.cpp                |   94 +
- lib/Target/R600/AMDILDeviceInfo.h                  |   88 +
- lib/Target/R600/AMDILDevices.h                     |   19 +
- lib/Target/R600/AMDILEvergreenDevice.cpp           |  169 ++
- lib/Target/R600/AMDILEvergreenDevice.h             |   93 +
- lib/Target/R600/AMDILFrameLowering.cpp             |   47 +
- lib/Target/R600/AMDILFrameLowering.h               |   40 +
- lib/Target/R600/AMDILISelDAGToDAG.cpp              |  485 ++++
- lib/Target/R600/AMDILISelLowering.cpp              |  651 +++++
- lib/Target/R600/AMDILInstrInfo.td                  |  208 ++
- lib/Target/R600/AMDILIntrinsicInfo.cpp             |   79 +
- lib/Target/R600/AMDILIntrinsicInfo.h               |   49 +
- lib/Target/R600/AMDILIntrinsics.td                 |  242 ++
- lib/Target/R600/AMDILNIDevice.cpp                  |   65 +
- lib/Target/R600/AMDILNIDevice.h                    |   57 +
- lib/Target/R600/AMDILPeepholeOptimizer.cpp         | 1215 ++++++++
- lib/Target/R600/AMDILRegisterInfo.td               |  107 +
- lib/Target/R600/AMDILSIDevice.cpp                  |   45 +
- lib/Target/R600/AMDILSIDevice.h                    |   39 +
- lib/Target/R600/CMakeLists.txt                     |   55 +
- lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp  |  132 +
- lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h    |   52 +
- lib/Target/R600/InstPrinter/CMakeLists.txt         |    7 +
- lib/Target/R600/InstPrinter/LLVMBuild.txt          |   24 +
- lib/Target/R600/InstPrinter/Makefile               |   15 +
- lib/Target/R600/LLVMBuild.txt                      |   32 +
- lib/Target/R600/MCTargetDesc/AMDGPUAsmBackend.cpp  |   90 +
- lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.cpp   |   85 +
- lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.h     |   30 +
- lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h |   60 +
- .../R600/MCTargetDesc/AMDGPUMCTargetDesc.cpp       |  113 +
- lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.h  |   55 +
- lib/Target/R600/MCTargetDesc/CMakeLists.txt        |   10 +
- lib/Target/R600/MCTargetDesc/LLVMBuild.txt         |   23 +
- lib/Target/R600/MCTargetDesc/Makefile              |   16 +
- lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp |  575 ++++
- lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp   |  298 ++
- lib/Target/R600/Makefile                           |   23 +
- lib/Target/R600/Processors.td                      |   29 +
- lib/Target/R600/R600Defines.h                      |   79 +
- lib/Target/R600/R600ExpandSpecialInstrs.cpp        |  334 +++
- lib/Target/R600/R600ISelLowering.cpp               |  909 ++++++
- lib/Target/R600/R600ISelLowering.h                 |   72 +
- lib/Target/R600/R600InstrInfo.cpp                  |  665 +++++
- lib/Target/R600/R600InstrInfo.h                    |  169 ++
- lib/Target/R600/R600Instructions.td                | 1724 +++++++++++
- lib/Target/R600/R600Intrinsics.td                  |   32 +
- lib/Target/R600/R600MachineFunctionInfo.cpp        |   34 +
- lib/Target/R600/R600MachineFunctionInfo.h          |   39 +
- lib/Target/R600/R600RegisterInfo.cpp               |   89 +
- lib/Target/R600/R600RegisterInfo.h                 |   55 +
- lib/Target/R600/R600RegisterInfo.td                |  107 +
- lib/Target/R600/R600Schedule.td                    |   36 +
- lib/Target/R600/SIAnnotateControlFlow.cpp          |  330 +++
- lib/Target/R600/SIAssignInterpRegs.cpp             |  152 +
- lib/Target/R600/SIISelLowering.cpp                 |  512 ++++
- lib/Target/R600/SIISelLowering.h                   |   62 +
- lib/Target/R600/SIInstrFormats.td                  |  146 +
- lib/Target/R600/SIInstrInfo.cpp                    |   90 +
- lib/Target/R600/SIInstrInfo.h                      |   62 +
- lib/Target/R600/SIInstrInfo.td                     |  589 ++++
- lib/Target/R600/SIInstructions.td                  | 1351 +++++++++
- lib/Target/R600/SIIntrinsics.td                    |   52 +
- lib/Target/R600/SILowerControlFlow.cpp             |  331 +++
- lib/Target/R600/SILowerLiteralConstants.cpp        |  108 +
- lib/Target/R600/SIMachineFunctionInfo.cpp          |   20 +
- lib/Target/R600/SIMachineFunctionInfo.h            |   34 +
- lib/Target/R600/SIRegisterInfo.cpp                 |   48 +
- lib/Target/R600/SIRegisterInfo.h                   |   47 +
- lib/Target/R600/SIRegisterInfo.td                  |  167 ++
- lib/Target/R600/SISchedule.td                      |   15 +
- lib/Target/R600/TargetInfo/AMDGPUTargetInfo.cpp    |   26 +
- lib/Target/R600/TargetInfo/CMakeLists.txt          |    7 +
- lib/Target/R600/TargetInfo/LLVMBuild.txt           |   23 +
- lib/Target/R600/TargetInfo/Makefile                |   15 +
- test/CodeGen/R600/add.v4i32.ll                     |   15 +
- test/CodeGen/R600/and.v4i32.ll                     |   15 +
- test/CodeGen/R600/fabs.ll                          |   16 +
- test/CodeGen/R600/fadd.ll                          |   16 +
- test/CodeGen/R600/fadd.v4f32.ll                    |   15 +
- test/CodeGen/R600/fcmp-cnd.ll                      |   14 +
- test/CodeGen/R600/fcmp-cnde-int-args.ll            |   16 +
- test/CodeGen/R600/fcmp.ll                          |   16 +
- test/CodeGen/R600/fdiv.v4f32.ll                    |   19 +
- test/CodeGen/R600/floor.ll                         |   16 +
- test/CodeGen/R600/fmax.ll                          |   16 +
- test/CodeGen/R600/fmin.ll                          |   16 +
- test/CodeGen/R600/fmul.ll                          |   16 +
- test/CodeGen/R600/fmul.v4f32.ll                    |   15 +
- test/CodeGen/R600/fsub.ll                          |   17 +
- test/CodeGen/R600/fsub.v4f32.ll                    |   15 +
- test/CodeGen/R600/i8_to_double_to_float.ll         |   11 +
- test/CodeGen/R600/icmp-select-sete-reverse-args.ll |   18 +
- test/CodeGen/R600/lit.local.cfg                    |   13 +
- test/CodeGen/R600/literals.ll                      |   30 +
- test/CodeGen/R600/llvm.AMDGPU.mul.ll               |   17 +
- test/CodeGen/R600/llvm.AMDGPU.trunc.ll             |   16 +
- test/CodeGen/R600/llvm.cos.ll                      |   16 +
- test/CodeGen/R600/llvm.pow.ll                      |   19 +
- test/CodeGen/R600/llvm.sin.ll                      |   16 +
- test/CodeGen/R600/load.constant_addrspace.f32.ll   |    9 +
- test/CodeGen/R600/load.i8.ll                       |   10 +
- test/CodeGen/R600/reciprocal.ll                    |   16 +
- test/CodeGen/R600/sdiv.ll                          |   21 +
- test/CodeGen/R600/selectcc-icmp-select-float.ll    |   15 +
- test/CodeGen/R600/selectcc_cnde.ll                 |   11 +
- test/CodeGen/R600/selectcc_cnde_int.ll             |   11 +
- test/CodeGen/R600/setcc.v4i32.ll                   |   12 +
- test/CodeGen/R600/short-args.ll                    |   37 +
- test/CodeGen/R600/store.v4f32.ll                   |    9 +
- test/CodeGen/R600/store.v4i32.ll                   |    9 +
- test/CodeGen/R600/udiv.v4i32.ll                    |   15 +
- test/CodeGen/R600/urem.v4i32.ll                    |   15 +
- test/CodeGen/R600/vec4-expand.ll                   |   52 +
- test/CodeGen/SI/sanity.ll                          |   37 +
- 149 files changed, 21461 insertions(+), 1 deletion(-)
- create mode 100644 include/llvm/IntrinsicsR600.td
- create mode 100644 lib/Target/R600/AMDGPU.h
- create mode 100644 lib/Target/R600/AMDGPU.td
- create mode 100644 lib/Target/R600/AMDGPUAsmPrinter.cpp
- create mode 100644 lib/Target/R600/AMDGPUAsmPrinter.h
- create mode 100644 lib/Target/R600/AMDGPUCodeEmitter.h
- create mode 100644 lib/Target/R600/AMDGPUConvertToISA.cpp
- create mode 100644 lib/Target/R600/AMDGPUISelLowering.cpp
- create mode 100644 lib/Target/R600/AMDGPUISelLowering.h
- create mode 100644 lib/Target/R600/AMDGPUInstrInfo.cpp
- create mode 100644 lib/Target/R600/AMDGPUInstrInfo.h
- create mode 100644 lib/Target/R600/AMDGPUInstrInfo.td
- create mode 100644 lib/Target/R600/AMDGPUInstructions.td
- create mode 100644 lib/Target/R600/AMDGPUIntrinsics.td
- create mode 100644 lib/Target/R600/AMDGPUMCInstLower.cpp
- create mode 100644 lib/Target/R600/AMDGPUMCInstLower.h
- create mode 100644 lib/Target/R600/AMDGPURegisterInfo.cpp
- create mode 100644 lib/Target/R600/AMDGPURegisterInfo.h
- create mode 100644 lib/Target/R600/AMDGPURegisterInfo.td
- create mode 100644 lib/Target/R600/AMDGPUStructurizeCFG.cpp
- create mode 100644 lib/Target/R600/AMDGPUSubtarget.cpp
- create mode 100644 lib/Target/R600/AMDGPUSubtarget.h
- create mode 100644 lib/Target/R600/AMDGPUTargetMachine.cpp
- create mode 100644 lib/Target/R600/AMDGPUTargetMachine.h
- create mode 100644 lib/Target/R600/AMDIL.h
- create mode 100644 lib/Target/R600/AMDIL7XXDevice.cpp
- create mode 100644 lib/Target/R600/AMDIL7XXDevice.h
- create mode 100644 lib/Target/R600/AMDILBase.td
- create mode 100644 lib/Target/R600/AMDILCFGStructurizer.cpp
- create mode 100644 lib/Target/R600/AMDILDevice.cpp
- create mode 100644 lib/Target/R600/AMDILDevice.h
- create mode 100644 lib/Target/R600/AMDILDeviceInfo.cpp
- create mode 100644 lib/Target/R600/AMDILDeviceInfo.h
- create mode 100644 lib/Target/R600/AMDILDevices.h
- create mode 100644 lib/Target/R600/AMDILEvergreenDevice.cpp
- create mode 100644 lib/Target/R600/AMDILEvergreenDevice.h
- create mode 100644 lib/Target/R600/AMDILFrameLowering.cpp
- create mode 100644 lib/Target/R600/AMDILFrameLowering.h
- create mode 100644 lib/Target/R600/AMDILISelDAGToDAG.cpp
- create mode 100644 lib/Target/R600/AMDILISelLowering.cpp
- create mode 100644 lib/Target/R600/AMDILInstrInfo.td
- create mode 100644 lib/Target/R600/AMDILIntrinsicInfo.cpp
- create mode 100644 lib/Target/R600/AMDILIntrinsicInfo.h
- create mode 100644 lib/Target/R600/AMDILIntrinsics.td
- create mode 100644 lib/Target/R600/AMDILNIDevice.cpp
- create mode 100644 lib/Target/R600/AMDILNIDevice.h
- create mode 100644 lib/Target/R600/AMDILPeepholeOptimizer.cpp
- create mode 100644 lib/Target/R600/AMDILRegisterInfo.td
- create mode 100644 lib/Target/R600/AMDILSIDevice.cpp
- create mode 100644 lib/Target/R600/AMDILSIDevice.h
- create mode 100644 lib/Target/R600/CMakeLists.txt
- create mode 100644 lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp
- create mode 100644 lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h
- create mode 100644 lib/Target/R600/InstPrinter/CMakeLists.txt
- create mode 100644 lib/Target/R600/InstPrinter/LLVMBuild.txt
- create mode 100644 lib/Target/R600/InstPrinter/Makefile
- create mode 100644 lib/Target/R600/LLVMBuild.txt
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUAsmBackend.cpp
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.cpp
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCAsmInfo.h
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.cpp
- create mode 100644 lib/Target/R600/MCTargetDesc/AMDGPUMCTargetDesc.h
- create mode 100644 lib/Target/R600/MCTargetDesc/CMakeLists.txt
- create mode 100644 lib/Target/R600/MCTargetDesc/LLVMBuild.txt
- create mode 100644 lib/Target/R600/MCTargetDesc/Makefile
- create mode 100644 lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
- create mode 100644 lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp
- create mode 100644 lib/Target/R600/Makefile
- create mode 100644 lib/Target/R600/Processors.td
- create mode 100644 lib/Target/R600/R600Defines.h
- create mode 100644 lib/Target/R600/R600ExpandSpecialInstrs.cpp
- create mode 100644 lib/Target/R600/R600ISelLowering.cpp
- create mode 100644 lib/Target/R600/R600ISelLowering.h
- create mode 100644 lib/Target/R600/R600InstrInfo.cpp
- create mode 100644 lib/Target/R600/R600InstrInfo.h
- create mode 100644 lib/Target/R600/R600Instructions.td
- create mode 100644 lib/Target/R600/R600Intrinsics.td
- create mode 100644 lib/Target/R600/R600MachineFunctionInfo.cpp
- create mode 100644 lib/Target/R600/R600MachineFunctionInfo.h
- create mode 100644 lib/Target/R600/R600RegisterInfo.cpp
- create mode 100644 lib/Target/R600/R600RegisterInfo.h
- create mode 100644 lib/Target/R600/R600RegisterInfo.td
- create mode 100644 lib/Target/R600/R600Schedule.td
- create mode 100644 lib/Target/R600/SIAnnotateControlFlow.cpp
- create mode 100644 lib/Target/R600/SIAssignInterpRegs.cpp
- create mode 100644 lib/Target/R600/SIISelLowering.cpp
- create mode 100644 lib/Target/R600/SIISelLowering.h
- create mode 100644 lib/Target/R600/SIInstrFormats.td
- create mode 100644 lib/Target/R600/SIInstrInfo.cpp
- create mode 100644 lib/Target/R600/SIInstrInfo.h
- create mode 100644 lib/Target/R600/SIInstrInfo.td
- create mode 100644 lib/Target/R600/SIInstructions.td
- create mode 100644 lib/Target/R600/SIIntrinsics.td
- create mode 100644 lib/Target/R600/SILowerControlFlow.cpp
- create mode 100644 lib/Target/R600/SILowerLiteralConstants.cpp
- create mode 100644 lib/Target/R600/SIMachineFunctionInfo.cpp
- create mode 100644 lib/Target/R600/SIMachineFunctionInfo.h
- create mode 100644 lib/Target/R600/SIRegisterInfo.cpp
- create mode 100644 lib/Target/R600/SIRegisterInfo.h
- create mode 100644 lib/Target/R600/SIRegisterInfo.td
- create mode 100644 lib/Target/R600/SISchedule.td
- create mode 100644 lib/Target/R600/TargetInfo/AMDGPUTargetInfo.cpp
- create mode 100644 lib/Target/R600/TargetInfo/CMakeLists.txt
- create mode 100644 lib/Target/R600/TargetInfo/LLVMBuild.txt
- create mode 100644 lib/Target/R600/TargetInfo/Makefile
- create mode 100644 test/CodeGen/R600/add.v4i32.ll
- create mode 100644 test/CodeGen/R600/and.v4i32.ll
- create mode 100644 test/CodeGen/R600/fabs.ll
- create mode 100644 test/CodeGen/R600/fadd.ll
- create mode 100644 test/CodeGen/R600/fadd.v4f32.ll
- create mode 100644 test/CodeGen/R600/fcmp-cnd.ll
- create mode 100644 test/CodeGen/R600/fcmp-cnde-int-args.ll
- create mode 100644 test/CodeGen/R600/fcmp.ll
- create mode 100644 test/CodeGen/R600/fdiv.v4f32.ll
- create mode 100644 test/CodeGen/R600/floor.ll
- create mode 100644 test/CodeGen/R600/fmax.ll
- create mode 100644 test/CodeGen/R600/fmin.ll
- create mode 100644 test/CodeGen/R600/fmul.ll
- create mode 100644 test/CodeGen/R600/fmul.v4f32.ll
- create mode 100644 test/CodeGen/R600/fsub.ll
- create mode 100644 test/CodeGen/R600/fsub.v4f32.ll
- create mode 100644 test/CodeGen/R600/i8_to_double_to_float.ll
- create mode 100644 test/CodeGen/R600/icmp-select-sete-reverse-args.ll
- create mode 100644 test/CodeGen/R600/lit.local.cfg
- create mode 100644 test/CodeGen/R600/literals.ll
- create mode 100644 test/CodeGen/R600/llvm.AMDGPU.mul.ll
- create mode 100644 test/CodeGen/R600/llvm.AMDGPU.trunc.ll
- create mode 100644 test/CodeGen/R600/llvm.cos.ll
- create mode 100644 test/CodeGen/R600/llvm.pow.ll
- create mode 100644 test/CodeGen/R600/llvm.sin.ll
- create mode 100644 test/CodeGen/R600/load.constant_addrspace.f32.ll
- create mode 100644 test/CodeGen/R600/load.i8.ll
- create mode 100644 test/CodeGen/R600/reciprocal.ll
- create mode 100644 test/CodeGen/R600/sdiv.ll
- create mode 100644 test/CodeGen/R600/selectcc-icmp-select-float.ll
- create mode 100644 test/CodeGen/R600/selectcc_cnde.ll
- create mode 100644 test/CodeGen/R600/selectcc_cnde_int.ll
- create mode 100644 test/CodeGen/R600/setcc.v4i32.ll
- create mode 100644 test/CodeGen/R600/short-args.ll
- create mode 100644 test/CodeGen/R600/store.v4f32.ll
- create mode 100644 test/CodeGen/R600/store.v4i32.ll
- create mode 100644 test/CodeGen/R600/udiv.v4i32.ll
- create mode 100644 test/CodeGen/R600/urem.v4i32.ll
- create mode 100644 test/CodeGen/R600/vec4-expand.ll
- create mode 100644 test/CodeGen/SI/sanity.ll
-
-diff --git a/CODE_OWNERS.TXT b/CODE_OWNERS.TXT
-index fd7bcda..90285be 100644
---- a/CODE_OWNERS.TXT
-+++ b/CODE_OWNERS.TXT
-@@ -49,3 +49,17 @@ D: Register allocators and TableGen
- N: Duncan Sands
- E: baldrick@free.fr
- D: DragonEgg
-+
-+N: Tom Stellard
-+E: thomas.stellard@amd.com
-+E: mesa-dev@lists.freedesktop.org
-+D: R600 Backend
-+
-+N: Andrew Trick
-+E: atrick@apple.com
-+D: IndVar Simplify, Loop Strength Reduction, Instruction Scheduling
-+
-+N: Bill Wendling
-+E: wendling@apple.com
-+D: libLTO & IR Linker
-+
-diff --git a/include/llvm/Intrinsics.td b/include/llvm/Intrinsics.td
-index 2e1597f..059bd80 100644
---- a/include/llvm/Intrinsics.td
-+++ b/include/llvm/Intrinsics.td
-@@ -469,3 +469,4 @@ include "llvm/IntrinsicsXCore.td"
- include "llvm/IntrinsicsHexagon.td"
- include "llvm/IntrinsicsNVVM.td"
- include "llvm/IntrinsicsMips.td"
-+include "llvm/IntrinsicsR600.td"
-diff --git a/include/llvm/IntrinsicsR600.td b/include/llvm/IntrinsicsR600.td
-new file mode 100644
-index 0000000..ecb5668
---- /dev/null
-+++ b/include/llvm/IntrinsicsR600.td
-@@ -0,0 +1,36 @@
-+//===- IntrinsicsR600.td - Defines R600 intrinsics ---------*- tablegen -*-===//
-+//
-+//                     The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//===----------------------------------------------------------------------===//
-+//
-+// This file defines all of the R600-specific intrinsics.
-+//
-+//===----------------------------------------------------------------------===//
-+
-+let TargetPrefix = "r600" in {
-+
-+class R600ReadPreloadRegisterIntrinsic<string name>
-+  : Intrinsic<[llvm_i32_ty], [], [IntrNoMem]>,
-+    GCCBuiltin<name>;
-+
-+multiclass R600ReadPreloadRegisterIntrinsic_xyz<string prefix> {
-+  def _x : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_x")>;
-+  def _y : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_y")>;
-+  def _z : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_z")>;
-+}
-+
-+defm int_r600_read_global_size : R600ReadPreloadRegisterIntrinsic_xyz <
-+                                       "__builtin_r600_read_global_size">;
-+defm int_r600_read_local_size : R600ReadPreloadRegisterIntrinsic_xyz <
-+                                       "__builtin_r600_read_local_size">;
-+defm int_r600_read_ngroups : R600ReadPreloadRegisterIntrinsic_xyz <
-+                                       "__builtin_r600_read_ngroups">;
-+defm int_r600_read_tgid : R600ReadPreloadRegisterIntrinsic_xyz <
-+                                       "__builtin_r600_read_tgid">;
-+defm int_r600_read_tidig : R600ReadPreloadRegisterIntrinsic_xyz <
-+                                       "__builtin_r600_read_tidig">;
-+} // End TargetPrefix = "r600"
+diff --git a/autoconf/configure.ac b/autoconf/configure.ac
+index 7715531..1330c36 100644
+--- a/autoconf/configure.ac
++++ b/autoconf/configure.ac
+@@ -751,6 +751,11 @@ AC_ARG_ENABLE([experimental-targets],AS_HELP_STRING([--enable-experimental-targe
+ 
+ if test ${enableval} != "disable"
+ then
++  if test ${enableval} = "AMDGPU"
++  then
++    AC_MSG_ERROR([The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600])
++    enableval="R600"
++  fi
+   TARGETS_TO_BUILD="$enableval $TARGETS_TO_BUILD"
+ fi
+ 
+diff --git a/configure b/configure
+index 4fa0705..02012b9 100755
+--- a/configure
++++ b/configure
+@@ -5473,6 +5473,13 @@ fi
+ 
+ if test ${enableval} != "disable"
+ then
++  if test ${enableval} = "AMDGPU"
++  then
++    { { echo "$as_me:$LINENO: error: The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600" >&5
++echo "$as_me: error: The AMDGPU target has been renamed to R600, please reconfigure with --enable-experimental-targets=R600" >&2;}
++   { (exit 1); exit 1; }; }
++    enableval="R600"
++  fi
+   TARGETS_TO_BUILD="$enableval $TARGETS_TO_BUILD"
+ fi
+ 
+@@ -10316,7 +10323,7 @@ else
+   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
+   lt_status=$lt_dlunknown
+   cat > conftest.$ac_ext <<EOF
+-#line 10317 "configure"
++#line 10326 "configure"
+ #include "confdefs.h"
+ 
+ #if HAVE_DLFCN_H
 diff --git a/lib/Target/LLVMBuild.txt b/lib/Target/LLVMBuild.txt
 index 8995080..84c4111 100644
 --- a/lib/Target/LLVMBuild.txt
@@ -527,10 +56,10 @@ index 8995080..84c4111 100644
  ; with the best execution engine (the native JIT, if available, or the
 diff --git a/lib/Target/R600/AMDGPU.h b/lib/Target/R600/AMDGPU.h
 new file mode 100644
-index 0000000..0f5125d
+index 0000000..ba87918
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPU.h
-@@ -0,0 +1,49 @@
+@@ -0,0 +1,51 @@
 +//===-- AMDGPU.h - MachineFunction passes hw codegen --------------*- C++ -*-=//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -556,17 +85,19 @@ index 0000000..0f5125d
 +// R600 Passes
 +FunctionPass* createR600KernelParametersPass(const DataLayout *TD);
 +FunctionPass *createR600ExpandSpecialInstrsPass(TargetMachine &tm);
++FunctionPass *createR600LowerConstCopy(TargetMachine &tm);
 +
 +// SI Passes
 +FunctionPass *createSIAnnotateControlFlowPass();
 +FunctionPass *createSIAssignInterpRegsPass(TargetMachine &tm);
 +FunctionPass *createSILowerControlFlowPass(TargetMachine &tm);
 +FunctionPass *createSICodeEmitterPass(formatted_raw_ostream &OS);
-+FunctionPass *createSILowerLiteralConstantsPass(TargetMachine &tm);
++FunctionPass *createSIInsertWaits(TargetMachine &tm);
 +
 +// Passes common to R600 and SI
 +Pass *createAMDGPUStructurizeCFGPass();
 +FunctionPass *createAMDGPUConvertToISAPass(TargetMachine &tm);
++FunctionPass* createAMDGPUIndirectAddressingPass(TargetMachine &tm);
 +
 +} // End namespace llvm
 +
@@ -628,10 +159,10 @@ index 0000000..40f4741
 +include "AMDGPUInstructions.td"
 diff --git a/lib/Target/R600/AMDGPUAsmPrinter.cpp b/lib/Target/R600/AMDGPUAsmPrinter.cpp
 new file mode 100644
-index 0000000..4553c45
+index 0000000..254e62e
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPUAsmPrinter.cpp
-@@ -0,0 +1,138 @@
+@@ -0,0 +1,145 @@
 +//===-- AMDGPUAsmPrinter.cpp - AMDGPU Assebly printer  --------------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -681,6 +212,9 @@ index 0000000..4553c45
 +#endif
 +  }
 +  SetupMachineFunction(MF);
++  if (OutStreamer.hasRawTextSupport()) {
++    OutStreamer.EmitRawText("@" + MF.getName() + ":");
++  }
 +  OutStreamer.SwitchSection(getObjFileLowering().getTextSection());
 +  if (STM.device()->getGeneration() > AMDGPUDeviceInfo::HD6XXX) {
 +    EmitProgramInfo(MF);
@@ -722,8 +256,6 @@ index 0000000..4553c45
 +        switch (reg) {
 +        default: break;
 +        case AMDGPU::EXEC:
-+        case AMDGPU::SI_LITERAL_CONSTANT:
-+        case AMDGPU::SREG_LIT_0:
 +        case AMDGPU::M0:
 +          continue;
 +        }
@@ -749,10 +281,16 @@ index 0000000..4553c45
 +        } else if (AMDGPU::SReg_256RegClass.contains(reg)) {
 +          isSGPR = true;
 +          width = 8;
++        } else if (AMDGPU::VReg_256RegClass.contains(reg)) {
++          isSGPR = false;
++          width = 8;
++        } else if (AMDGPU::VReg_512RegClass.contains(reg)) {
++          isSGPR = false;
++          width = 16;
 +        } else {
 +          assert(!"Unknown register class");
 +        }
-+        hwReg = RI->getEncodingValue(reg);
++        hwReg = RI->getEncodingValue(reg) & 0xff;
 +        maxUsed = hwReg + width - 1;
 +        if (isSGPR) {
 +          MaxSGPR = maxUsed > MaxSGPR ? maxUsed : MaxSGPR;
@@ -820,61 +358,6 @@ index 0000000..3812282
 +} // End anonymous llvm
 +
 +#endif //AMDGPU_ASMPRINTER_H
-diff --git a/lib/Target/R600/AMDGPUCodeEmitter.h b/lib/Target/R600/AMDGPUCodeEmitter.h
-new file mode 100644
-index 0000000..84f3588
---- /dev/null
-+++ b/lib/Target/R600/AMDGPUCodeEmitter.h
-@@ -0,0 +1,49 @@
-+//===-- AMDGPUCodeEmitter.h - AMDGPU Code Emitter interface -----------------===//
-+//
-+//                     The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//===----------------------------------------------------------------------===//
-+//
-+/// \file
-+/// \brief CodeEmitter interface for R600 and SI codegen.
-+//
-+//===----------------------------------------------------------------------===//
-+
-+#ifndef AMDGPUCODEEMITTER_H
-+#define AMDGPUCODEEMITTER_H
-+
-+namespace llvm {
-+
-+class AMDGPUCodeEmitter {
-+public:
-+  uint64_t getBinaryCodeForInstr(const MachineInstr &MI) const;
-+  virtual uint64_t getMachineOpValue(const MachineInstr &MI,
-+                                   const MachineOperand &MO) const { return 0; }
-+  virtual unsigned GPR4AlignEncode(const MachineInstr  &MI,
-+                                     unsigned OpNo) const {
-+    return 0;
-+  }
-+  virtual unsigned GPR2AlignEncode(const MachineInstr &MI,
-+                                   unsigned OpNo) const {
-+    return 0;
-+  }
-+  virtual uint64_t VOPPostEncode(const MachineInstr &MI,
-+                                 uint64_t Value) const {
-+    return Value;
-+  }
-+  virtual uint64_t i32LiteralEncode(const MachineInstr &MI,
-+                                    unsigned OpNo) const {
-+    return 0;
-+  }
-+  virtual uint32_t SMRDmemriEncode(const MachineInstr &MI, unsigned OpNo)
-+                                                                   const {
-+    return 0;
-+  }
-+};
-+
-+} // End namespace llvm
-+
-+#endif // AMDGPUCODEEMITTER_H
 diff --git a/lib/Target/R600/AMDGPUConvertToISA.cpp b/lib/Target/R600/AMDGPUConvertToISA.cpp
 new file mode 100644
 index 0000000..50297d1
@@ -943,12 +426,190 @@ index 0000000..50297d1
 +  }
 +  return false;
 +}
+diff --git a/lib/Target/R600/AMDGPUFrameLowering.cpp b/lib/Target/R600/AMDGPUFrameLowering.cpp
+new file mode 100644
+index 0000000..a3b6936
+--- /dev/null
++++ b/lib/Target/R600/AMDGPUFrameLowering.cpp
+@@ -0,0 +1,122 @@
++//===----------------------- AMDGPUFrameLowering.cpp ----------------------===//
++//
++//                     The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//==-----------------------------------------------------------------------===//
++//
++// Interface to describe a layout of a stack frame on a AMDIL target machine
++//
++//===----------------------------------------------------------------------===//
++#include "AMDGPUFrameLowering.h"
++#include "AMDGPURegisterInfo.h"
++#include "R600MachineFunctionInfo.h"
++#include "llvm/CodeGen/MachineFrameInfo.h"
++#include "llvm/CodeGen/MachineRegisterInfo.h"
++#include "llvm/Instructions.h"
++
++using namespace llvm;
++AMDGPUFrameLowering::AMDGPUFrameLowering(StackDirection D, unsigned StackAl,
++    int LAO, unsigned TransAl)
++  : TargetFrameLowering(D, StackAl, LAO, TransAl) { }
++
++AMDGPUFrameLowering::~AMDGPUFrameLowering() { }
++
++unsigned AMDGPUFrameLowering::getStackWidth(const MachineFunction &MF) const {
++
++  // XXX: Hardcoding to 1 for now.
++  //
++  // I think the StackWidth should stored as metadata associated with the
++  // MachineFunction.  This metadata can either be added by a frontend, or
++  // calculated by a R600 specific LLVM IR pass.
++  //
++  // The StackWidth determines how stack objects are laid out in memory.
++  // For a vector stack variable, like: int4 stack[2], the data will be stored
++  // in the following ways depending on the StackWidth.
++  //
++  // StackWidth = 1:
++  //
++  // T0.X = stack[0].x
++  // T1.X = stack[0].y
++  // T2.X = stack[0].z
++  // T3.X = stack[0].w
++  // T4.X = stack[1].x
++  // T5.X = stack[1].y
++  // T6.X = stack[1].z
++  // T7.X = stack[1].w
++  //
++  // StackWidth = 2:
++  //
++  // T0.X = stack[0].x
++  // T0.Y = stack[0].y
++  // T1.X = stack[0].z
++  // T1.Y = stack[0].w
++  // T2.X = stack[1].x
++  // T2.Y = stack[1].y
++  // T3.X = stack[1].z
++  // T3.Y = stack[1].w
++  // 
++  // StackWidth = 4:
++  // T0.X = stack[0].x
++  // T0.Y = stack[0].y
++  // T0.Z = stack[0].z
++  // T0.W = stack[0].w
++  // T1.X = stack[1].x
++  // T1.Y = stack[1].y
++  // T1.Z = stack[1].z
++  // T1.W = stack[1].w
++  return 1;
++}
++
++/// \returns The number of registers allocated for \p FI.
++int AMDGPUFrameLowering::getFrameIndexOffset(const MachineFunction &MF,
++                                         int FI) const {
++  const MachineFrameInfo *MFI = MF.getFrameInfo();
++  unsigned Offset = 0;
++  int UpperBound = FI == -1 ? MFI->getNumObjects() : FI;
++
++  for (int i = MFI->getObjectIndexBegin(); i < UpperBound; ++i) {
++    const AllocaInst *Alloca = MFI->getObjectAllocation(i);
++    unsigned ArrayElements;
++    const Type *AllocaType = Alloca->getAllocatedType();
++    const Type *ElementType;
++
++    if (AllocaType->isArrayTy()) {
++      ArrayElements = AllocaType->getArrayNumElements();
++      ElementType = AllocaType->getArrayElementType();
++    } else {
++      ArrayElements = 1;
++      ElementType = AllocaType;
++    }
++
++    unsigned VectorElements;
++    if (ElementType->isVectorTy()) {
++      VectorElements = ElementType->getVectorNumElements();
++    } else {
++      VectorElements = 1;
++    }
++
++    Offset += (VectorElements / getStackWidth(MF)) * ArrayElements;
++  }
++  return Offset;
++}
++
++const TargetFrameLowering::SpillSlot *
++AMDGPUFrameLowering::getCalleeSavedSpillSlots(unsigned &NumEntries) const {
++  NumEntries = 0;
++  return 0;
++}
++void
++AMDGPUFrameLowering::emitPrologue(MachineFunction &MF) const {
++}
++void
++AMDGPUFrameLowering::emitEpilogue(MachineFunction &MF,
++                                  MachineBasicBlock &MBB) const {
++}
++
++bool
++AMDGPUFrameLowering::hasFP(const MachineFunction &MF) const {
++  return false;
++}
+diff --git a/lib/Target/R600/AMDGPUFrameLowering.h b/lib/Target/R600/AMDGPUFrameLowering.h
+new file mode 100644
+index 0000000..cf5742e
+--- /dev/null
++++ b/lib/Target/R600/AMDGPUFrameLowering.h
+@@ -0,0 +1,44 @@
++//===--------------------- AMDGPUFrameLowering.h ----------------*- C++ -*-===//
++//
++//                     The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++//
++/// \file
++/// \brief Interface to describe a layout of a stack frame on a AMDIL target
++/// machine.
++//
++//===----------------------------------------------------------------------===//
++#ifndef AMDILFRAME_LOWERING_H
++#define AMDILFRAME_LOWERING_H
++
++#include "llvm/CodeGen/MachineFunction.h"
++#include "llvm/Target/TargetFrameLowering.h"
++
++namespace llvm {
++
++/// \brief Information about the stack frame layout on the AMDGPU targets.
++///
++/// It holds the direction of the stack growth, the known stack alignment on
++/// entry to each function, and the offset to the locals area.
++/// See TargetFrameInfo for more comments.
++class AMDGPUFrameLowering : public TargetFrameLowering {
++public:
++  AMDGPUFrameLowering(StackDirection D, unsigned StackAl, int LAO,
++                      unsigned TransAl = 1);
++  virtual ~AMDGPUFrameLowering();
++
++  /// \returns The number of 32-bit sub-registers that are used when storing
++  /// values to the stack.
++  virtual unsigned getStackWidth(const MachineFunction &MF) const;
++  virtual int getFrameIndexOffset(const MachineFunction &MF, int FI) const;
++  virtual const SpillSlot *getCalleeSavedSpillSlots(unsigned &NumEntries) const;
++  virtual void emitPrologue(MachineFunction &MF) const;
++  virtual void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const;
++  virtual bool hasFP(const MachineFunction &MF) const;
++};
++} // namespace llvm
++#endif // AMDILFRAME_LOWERING_H
 diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
 new file mode 100644
-index 0000000..473dac4
+index 0000000..d0d23d6
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
-@@ -0,0 +1,417 @@
+@@ -0,0 +1,418 @@
 +//===-- AMDGPUISelLowering.cpp - AMDGPU Common DAG lowering functions -----===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -1361,17 +1022,18 @@ index 0000000..473dac4
 +  NODE_NAME_CASE(SMIN)
 +  NODE_NAME_CASE(UMIN)
 +  NODE_NAME_CASE(URECIP)
-+  NODE_NAME_CASE(INTERP)
-+  NODE_NAME_CASE(INTERP_P0)
 +  NODE_NAME_CASE(EXPORT)
++  NODE_NAME_CASE(CONST_ADDRESS)
++  NODE_NAME_CASE(REGISTER_LOAD)
++  NODE_NAME_CASE(REGISTER_STORE)
 +  }
 +}
 diff --git a/lib/Target/R600/AMDGPUISelLowering.h b/lib/Target/R600/AMDGPUISelLowering.h
 new file mode 100644
-index 0000000..c7abaf6
+index 0000000..99a11ff
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPUISelLowering.h
-@@ -0,0 +1,144 @@
+@@ -0,0 +1,140 @@
 +//===-- AMDGPUISelLowering.h - AMDGPU Lowering Interface --------*- C++ -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -1427,6 +1089,11 @@ index 0000000..c7abaf6
 +                              const SmallVectorImpl<ISD::OutputArg> &Outs,
 +                              const SmallVectorImpl<SDValue> &OutVals,
 +                              DebugLoc DL, SelectionDAG &DAG) const;
++  virtual SDValue LowerCall(CallLoweringInfo &CLI,
++                            SmallVectorImpl<SDValue> &InVals) const {
++    CLI.Callee.dump();
++    llvm_unreachable("Undefined function");
++  }
 +
 +  virtual SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const;
 +  SDValue LowerIntrinsicIABS(SDValue Op, SelectionDAG &DAG) const;
@@ -1494,35 +1161,26 @@ index 0000000..c7abaf6
 +  SMIN,
 +  UMIN,
 +  URECIP,
-+  INTERP,
-+  INTERP_P0,
 +  EXPORT,
++  CONST_ADDRESS,
++  REGISTER_LOAD,
++  REGISTER_STORE,
 +  LAST_AMDGPU_ISD_NUMBER
 +};
 +
 +
 +} // End namespace AMDGPUISD
 +
-+namespace SIISD {
-+
-+enum {
-+  SI_FIRST = AMDGPUISD::LAST_AMDGPU_ISD_NUMBER,
-+  VCC_AND,
-+  VCC_BITCAST
-+};
-+
-+} // End namespace SIISD
-+
 +} // End namespace llvm
 +
 +#endif // AMDGPUISELLOWERING_H
-diff --git a/lib/Target/R600/AMDGPUInstrInfo.cpp b/lib/Target/R600/AMDGPUInstrInfo.cpp
+diff --git a/lib/Target/R600/AMDGPUIndirectAddressing.cpp b/lib/Target/R600/AMDGPUIndirectAddressing.cpp
 new file mode 100644
-index 0000000..e42a46d
+index 0000000..15840b3
 --- /dev/null
-+++ b/lib/Target/R600/AMDGPUInstrInfo.cpp
-@@ -0,0 +1,257 @@
-+//===-- AMDGPUInstrInfo.cpp - Base class for AMD GPU InstrInfo ------------===//
++++ b/lib/Target/R600/AMDGPUIndirectAddressing.cpp
+@@ -0,0 +1,344 @@
++//===-- AMDGPUIndirectAddressing.cpp - Indirect Adressing Support ---------===//
 +//
 +//                     The LLVM Compiler Infrastructure
 +//
@@ -1532,60 +1190,410 @@ index 0000000..e42a46d
 +//===----------------------------------------------------------------------===//
 +//
 +/// \file
-+/// \brief Implementation of the TargetInstrInfo class that is common to all
-+/// AMD GPUs.
++///
++/// Instructions can use indirect addressing to index the register file as if it
++/// were memory.  This pass lowers RegisterLoad and RegisterStore instructions
++/// to either a COPY or a MOV that uses indirect addressing.
 +//
 +//===----------------------------------------------------------------------===//
 +
-+#include "AMDGPUInstrInfo.h"
-+#include "AMDGPURegisterInfo.h"
-+#include "AMDGPUTargetMachine.h"
-+#include "AMDIL.h"
-+#include "llvm/CodeGen/MachineFrameInfo.h"
++#include "AMDGPU.h"
++#include "R600InstrInfo.h"
++#include "R600MachineFunctionInfo.h"
++#include "llvm/CodeGen/MachineFunction.h"
++#include "llvm/CodeGen/MachineFunctionPass.h"
 +#include "llvm/CodeGen/MachineInstrBuilder.h"
 +#include "llvm/CodeGen/MachineRegisterInfo.h"
-+
-+#define GET_INSTRINFO_CTOR
-+#include "AMDGPUGenInstrInfo.inc"
++#include "llvm/Support/Debug.h"
 +
 +using namespace llvm;
 +
-+AMDGPUInstrInfo::AMDGPUInstrInfo(TargetMachine &tm)
-+  : AMDGPUGenInstrInfo(0,0), RI(tm, *this), TM(tm) { }
++namespace {
 +
-+const AMDGPURegisterInfo &AMDGPUInstrInfo::getRegisterInfo() const {
-+  return RI;
-+}
++class AMDGPUIndirectAddressingPass : public MachineFunctionPass {
 +
-+bool AMDGPUInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,
-+                                           unsigned &SrcReg, unsigned &DstReg,
-+                                           unsigned &SubIdx) const {
-+// TODO: Implement this function
-+  return false;
-+}
++private:
++  static char ID;
++  const AMDGPUInstrInfo *TII;
 +
-+unsigned AMDGPUInstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
-+                                             int &FrameIndex) const {
-+// TODO: Implement this function
-+  return 0;
-+}
++  bool regHasExplicitDef(MachineRegisterInfo &MRI, unsigned Reg) const;
 +
-+unsigned AMDGPUInstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI,
-+                                                   int &FrameIndex) const {
-+// TODO: Implement this function
-+  return 0;
-+}
++public:
++  AMDGPUIndirectAddressingPass(TargetMachine &tm) :
++    MachineFunctionPass(ID),
++    TII(static_cast<const AMDGPUInstrInfo*>(tm.getInstrInfo()))
++    { }
 +
-+bool AMDGPUInstrInfo::hasLoadFromStackSlot(const MachineInstr *MI,
-+                                          const MachineMemOperand *&MMO,
-+                                          int &FrameIndex) const {
-+// TODO: Implement this function
-+  return false;
++  virtual bool runOnMachineFunction(MachineFunction &MF);
++
++  const char *getPassName() const { return "R600 Handle indirect addressing"; }
++
++};
++
++} // End anonymous namespace
++
++char AMDGPUIndirectAddressingPass::ID = 0;
++
++FunctionPass *llvm::createAMDGPUIndirectAddressingPass(TargetMachine &tm) {
++  return new AMDGPUIndirectAddressingPass(tm);
 +}
-+unsigned AMDGPUInstrInfo::isStoreFromStackSlot(const MachineInstr *MI,
-+                                              int &FrameIndex) const {
-+// TODO: Implement this function
-+  return 0;
++
++bool AMDGPUIndirectAddressingPass::runOnMachineFunction(MachineFunction &MF) {
++  MachineRegisterInfo &MRI = MF.getRegInfo();
++
++  int IndirectBegin = TII->getIndirectIndexBegin(MF);
++  int IndirectEnd = TII->getIndirectIndexEnd(MF);
++
++  if (IndirectBegin == -1) {
++    // No indirect addressing, we can skip this pass
++    assert(IndirectEnd == -1);
++    return false;
++  }
++
++  // The map keeps track of the indirect address that is represented by
++  // each virtual register. The key is the register and the value is the
++  // indirect address it uses.
++  std::map<unsigned, unsigned> RegisterAddressMap;
++
++  // First pass - Lower all of the RegisterStore instructions and track which
++  // registers are live.
++  for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
++                                                      BB != BB_E; ++BB) {
++    // This map keeps track of the current live indirect registers.
++    // The key is the address and the value is the register
++    std::map<unsigned, unsigned> LiveAddressRegisterMap;
++    MachineBasicBlock &MBB = *BB;
++
++    for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I);
++                               I != MBB.end(); I = Next) {
++      Next = llvm::next(I);
++      MachineInstr &MI = *I;
++
++      if (!TII->isRegisterStore(MI)) {
++        continue;
++      }
++
++      // Lower RegisterStore
++
++      unsigned RegIndex = MI.getOperand(2).getImm();
++      unsigned Channel = MI.getOperand(3).getImm();
++      unsigned Address = TII->calculateIndirectAddress(RegIndex, Channel);
++      const TargetRegisterClass *IndirectStoreRegClass =
++                   TII->getIndirectAddrStoreRegClass(MI.getOperand(0).getReg());
++
++      if (MI.getOperand(1).getReg() == AMDGPU::INDIRECT_BASE_ADDR) {
++        // Direct register access.
++        unsigned DstReg = MRI.createVirtualRegister(IndirectStoreRegClass);
++
++        BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY), DstReg)
++                .addOperand(MI.getOperand(0));
++
++        RegisterAddressMap[DstReg] = Address;
++        LiveAddressRegisterMap[Address] = DstReg;
++      } else {
++        // Indirect register access.
++        MachineInstrBuilder MOV = TII->buildIndirectWrite(BB, I,
++                                           MI.getOperand(0).getReg(), // Value
++                                           Address,
++                                           MI.getOperand(1).getReg()); // Offset
++        for (int i = IndirectBegin; i <= IndirectEnd; ++i) {
++          unsigned Addr = TII->calculateIndirectAddress(i, Channel);
++          unsigned DstReg = MRI.createVirtualRegister(IndirectStoreRegClass);
++          MOV.addReg(DstReg, RegState::Define | RegState::Implicit);
++          RegisterAddressMap[DstReg] = Addr;
++          LiveAddressRegisterMap[Addr] = DstReg;
++        }
++      }
++      MI.eraseFromParent();
++    }
++
++    // Update the live-ins of the succesor blocks
++    for (MachineBasicBlock::succ_iterator Succ = MBB.succ_begin(),
++                                          SuccEnd = MBB.succ_end();
++                                          SuccEnd != Succ; ++Succ) {
++      std::map<unsigned, unsigned>::const_iterator Key, KeyEnd;
++      for (Key = LiveAddressRegisterMap.begin(),
++           KeyEnd = LiveAddressRegisterMap.end(); KeyEnd != Key; ++Key) {
++        (*Succ)->addLiveIn(Key->second);
++      }
++    }
++  }
++
++  // Second pass - Lower the RegisterLoad instructions
++  for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
++                                                      BB != BB_E; ++BB) {
++    // Key is the address and the value is the register
++    std::map<unsigned, unsigned> LiveAddressRegisterMap;
++    MachineBasicBlock &MBB = *BB;
++
++    MachineBasicBlock::livein_iterator LI = MBB.livein_begin();
++    while (LI != MBB.livein_end()) {
++      std::vector<unsigned> PhiRegisters;
++
++      // Make sure this live in is used for indirect addressing
++      if (RegisterAddressMap.find(*LI) == RegisterAddressMap.end()) {
++        ++LI;
++        continue;
++      }
++
++      unsigned Address = RegisterAddressMap[*LI];
++      LiveAddressRegisterMap[Address] = *LI;
++      PhiRegisters.push_back(*LI);
++
++      // Check if there are other live in registers which map to the same
++      // indirect address.
++      for (MachineBasicBlock::livein_iterator LJ = llvm::next(LI),
++                                              LE = MBB.livein_end();
++                                              LJ != LE; ++LJ) {
++        unsigned Reg = *LJ;
++        if (RegisterAddressMap.find(Reg) == RegisterAddressMap.end()) {
++          continue;
++        }
++
++        if (RegisterAddressMap[Reg] == Address) {
++          PhiRegisters.push_back(Reg);
++        }
++      }
++
++      if (PhiRegisters.size() == 1) {
++        // We don't need to insert a Phi instruction, so we can just add the
++        // registers to the live list for the block.
++        LiveAddressRegisterMap[Address] = *LI;
++        MBB.removeLiveIn(*LI);
++      } else {
++        // We need to insert a PHI, because we have the same address being
++        // written in multiple predecessor blocks.
++        const TargetRegisterClass *PhiDstClass =
++                   TII->getIndirectAddrStoreRegClass(*(PhiRegisters.begin()));
++        unsigned PhiDstReg = MRI.createVirtualRegister(PhiDstClass);
++        MachineInstrBuilder Phi = BuildMI(MBB, MBB.begin(),
++                                          MBB.findDebugLoc(MBB.begin()),
++                                          TII->get(AMDGPU::PHI), PhiDstReg);
++
++        for (std::vector<unsigned>::const_iterator RI = PhiRegisters.begin(),
++                                                   RE = PhiRegisters.end();
++                                                   RI != RE; ++RI) {
++          unsigned Reg = *RI;
++          MachineInstr *DefInst = MRI.getVRegDef(Reg);
++          assert(DefInst);
++          MachineBasicBlock *RegBlock = DefInst->getParent();
++          Phi.addReg(Reg);
++          Phi.addMBB(RegBlock);
++          MBB.removeLiveIn(Reg);
++        }
++        RegisterAddressMap[PhiDstReg] = Address;
++        LiveAddressRegisterMap[Address] = PhiDstReg;
++      }
++      LI = MBB.livein_begin();
++    }
++
++    for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I);
++                               I != MBB.end(); I = Next) {
++      Next = llvm::next(I);
++      MachineInstr &MI = *I;
++
++      if (!TII->isRegisterLoad(MI)) {
++        if (MI.getOpcode() == AMDGPU::PHI) {
++          continue;
++        }
++        // Check for indirect register defs
++        for (unsigned OpIdx = 0, NumOperands = MI.getNumOperands();
++                                 OpIdx < NumOperands; ++OpIdx) {
++          MachineOperand &MO = MI.getOperand(OpIdx);
++          if (MO.isReg() && MO.isDef() &&
++              RegisterAddressMap.find(MO.getReg()) != RegisterAddressMap.end()) {
++            unsigned Reg = MO.getReg();
++            unsigned LiveAddress = RegisterAddressMap[Reg];
++            // Chain the live-ins
++            if (LiveAddressRegisterMap.find(LiveAddress) !=
++                                                     RegisterAddressMap.end()) {
++              MI.addOperand(MachineOperand::CreateReg(
++                                  LiveAddressRegisterMap[LiveAddress],
++                                  false, // isDef
++                                  true,  // isImp
++                                  true));  // isKill
++            }
++            LiveAddressRegisterMap[LiveAddress] = Reg;
++          }
++        }
++        continue;
++      }
++
++      const TargetRegisterClass *SuperIndirectRegClass =
++                                                TII->getSuperIndirectRegClass();
++      const TargetRegisterClass *IndirectLoadRegClass =
++                                             TII->getIndirectAddrLoadRegClass();
++      unsigned IndirectReg = MRI.createVirtualRegister(SuperIndirectRegClass);
++
++      unsigned RegIndex = MI.getOperand(2).getImm();
++      unsigned Channel = MI.getOperand(3).getImm();
++      unsigned Address = TII->calculateIndirectAddress(RegIndex, Channel);
++
++      if (MI.getOperand(1).getReg() == AMDGPU::INDIRECT_BASE_ADDR) {
++        // Direct register access
++        unsigned Reg = LiveAddressRegisterMap[Address];
++        unsigned AddrReg = IndirectLoadRegClass->getRegister(Address);
++
++        if (regHasExplicitDef(MRI, Reg)) {
++          // If the register we are reading from has an explicit def, then that
++          // means it was written via a direct register access (i.e. COPY
++          // or other instruction that doesn't use indirect addressing).  In
++          // this case we know where the value has been stored, so we can just
++          // issue a copy.
++          BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY),
++                  MI.getOperand(0).getReg())
++                  .addReg(Reg);
++        } else {
++          // If the register we are reading has an implicit def, then that
++          // means it was written by an indirect register access (i.e. An
++          // instruction that uses indirect addressing. 
++          BuildMI(MBB, I, MBB.findDebugLoc(I), TII->get(AMDGPU::COPY),
++                   MI.getOperand(0).getReg())
++                   .addReg(AddrReg)
++                   .addReg(Reg, RegState::Implicit);
++        }
++      } else {
++        // Indirect register access
++
++        // Note on REQ_SEQUENCE instructons: You can't actually use the register
++        // it defines unless  you have an instruction that takes the defined
++        // register class as an operand.
++
++        MachineInstrBuilder Sequence = BuildMI(MBB, I, MBB.findDebugLoc(I),
++                                               TII->get(AMDGPU::REG_SEQUENCE),
++                                               IndirectReg);
++        for (int i = IndirectBegin; i <= IndirectEnd; ++i) {
++          unsigned Addr = TII->calculateIndirectAddress(i, Channel);
++          if (LiveAddressRegisterMap.find(Addr) == LiveAddressRegisterMap.end()) {
++            continue;
++          }
++          unsigned Reg = LiveAddressRegisterMap[Addr];
++
++          // We only need to use REG_SEQUENCE for explicit defs, since the
++          // register coalescer won't do anything with the implicit defs.
++          MachineInstr *DefInstr = MRI.getVRegDef(Reg);
++          if (!regHasExplicitDef(MRI, Reg)) {
++            continue;
++          }
++
++          // Insert a REQ_SEQUENCE instruction to force the register allocator
++          // to allocate the virtual register to the correct physical register.
++          Sequence.addReg(LiveAddressRegisterMap[Addr]);
++          Sequence.addImm(TII->getRegisterInfo().getIndirectSubReg(Addr));
++        }
++        MachineInstrBuilder Mov = TII->buildIndirectRead(BB, I,
++                                           MI.getOperand(0).getReg(), // Value
++                                           Address,
++                                           MI.getOperand(1).getReg()); // Offset
++
++
++
++        Mov.addReg(IndirectReg, RegState::Implicit | RegState::Kill);
++        Mov.addReg(LiveAddressRegisterMap[Address], RegState::Implicit);
++
++      }
++      MI.eraseFromParent();
++    }
++  }
++  return false;
++}
++
++bool AMDGPUIndirectAddressingPass::regHasExplicitDef(MachineRegisterInfo &MRI,
++                                                  unsigned Reg) const {
++  MachineInstr *DefInstr = MRI.getVRegDef(Reg);
++
++  if (!DefInstr) {
++    return false;
++  }
++
++  if (DefInstr->getOpcode() == AMDGPU::PHI) {
++    bool Explicit = false;
++    for (MachineInstr::const_mop_iterator I = DefInstr->operands_begin(),
++                                          E = DefInstr->operands_end();
++                                          I != E; ++I) {
++      const MachineOperand &MO = *I;
++      if (!MO.isReg() || MO.isDef()) {
++        continue;
++      }
++
++      Explicit = Explicit || regHasExplicitDef(MRI, MO.getReg());
++    }
++    return Explicit;
++  }
++
++  return DefInstr->getOperand(0).isReg() &&
++         DefInstr->getOperand(0).getReg() == Reg;
++}
+diff --git a/lib/Target/R600/AMDGPUInstrInfo.cpp b/lib/Target/R600/AMDGPUInstrInfo.cpp
+new file mode 100644
+index 0000000..640707d
+--- /dev/null
++++ b/lib/Target/R600/AMDGPUInstrInfo.cpp
+@@ -0,0 +1,266 @@
++//===-- AMDGPUInstrInfo.cpp - Base class for AMD GPU InstrInfo ------------===//
++//
++//                     The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++//
++/// \file
++/// \brief Implementation of the TargetInstrInfo class that is common to all
++/// AMD GPUs.
++//
++//===----------------------------------------------------------------------===//
++
++#include "AMDGPUInstrInfo.h"
++#include "AMDGPURegisterInfo.h"
++#include "AMDGPUTargetMachine.h"
++#include "AMDIL.h"
++#include "llvm/CodeGen/MachineFrameInfo.h"
++#include "llvm/CodeGen/MachineInstrBuilder.h"
++#include "llvm/CodeGen/MachineRegisterInfo.h"
++
++#define GET_INSTRINFO_CTOR
++#include "AMDGPUGenInstrInfo.inc"
++
++using namespace llvm;
++
++AMDGPUInstrInfo::AMDGPUInstrInfo(TargetMachine &tm)
++  : AMDGPUGenInstrInfo(0,0), RI(tm, *this), TM(tm) { }
++
++const AMDGPURegisterInfo &AMDGPUInstrInfo::getRegisterInfo() const {
++  return RI;
++}
++
++bool AMDGPUInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,
++                                           unsigned &SrcReg, unsigned &DstReg,
++                                           unsigned &SubIdx) const {
++// TODO: Implement this function
++  return false;
++}
++
++unsigned AMDGPUInstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
++                                             int &FrameIndex) const {
++// TODO: Implement this function
++  return 0;
++}
++
++unsigned AMDGPUInstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI,
++                                                   int &FrameIndex) const {
++// TODO: Implement this function
++  return 0;
++}
++
++bool AMDGPUInstrInfo::hasLoadFromStackSlot(const MachineInstr *MI,
++                                          const MachineMemOperand *&MMO,
++                                          int &FrameIndex) const {
++// TODO: Implement this function
++  return false;
++}
++unsigned AMDGPUInstrInfo::isStoreFromStackSlot(const MachineInstr *MI,
++                                              int &FrameIndex) const {
++// TODO: Implement this function
++  return 0;
 +}
 +unsigned AMDGPUInstrInfo::isStoreFromStackSlotPostFE(const MachineInstr *MI,
 +                                                    int &FrameIndex) const {
@@ -1758,7 +1766,16 @@ index 0000000..e42a46d
 +  // TODO: Implement this function
 +  return true;
 +}
-+ 
++
++bool AMDGPUInstrInfo::isRegisterStore(const MachineInstr &MI) const {
++  return get(MI.getOpcode()).TSFlags & AMDGPU_FLAG_REGISTER_STORE;
++}
++
++bool AMDGPUInstrInfo::isRegisterLoad(const MachineInstr &MI) const {
++  return get(MI.getOpcode()).TSFlags & AMDGPU_FLAG_REGISTER_LOAD;
++}
++
++
 +void AMDGPUInstrInfo::convertToISA(MachineInstr & MI, MachineFunction &MF,
 +    DebugLoc DL) const {
 +  MachineRegisterInfo &MRI = MF.getRegInfo();
@@ -1781,10 +1798,10 @@ index 0000000..e42a46d
 +}
 diff --git a/lib/Target/R600/AMDGPUInstrInfo.h b/lib/Target/R600/AMDGPUInstrInfo.h
 new file mode 100644
-index 0000000..32ac691
+index 0000000..5220aa0
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPUInstrInfo.h
-@@ -0,0 +1,149 @@
+@@ -0,0 +1,207 @@
 +//===-- AMDGPUInstrInfo.h - AMDGPU Instruction Information ------*- C++ -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -1828,9 +1845,10 @@ index 0000000..32ac691
 +class AMDGPUInstrInfo : public AMDGPUGenInstrInfo {
 +private:
 +  const AMDGPURegisterInfo RI;
-+  TargetMachine &TM;
 +  bool getNextBranchInstr(MachineBasicBlock::iterator &iter,
 +                          MachineBasicBlock &MBB) const;
++protected:
++  TargetMachine &TM;
 +public:
 +  explicit AMDGPUInstrInfo(TargetMachine &tm);
 +
@@ -1918,12 +1936,66 @@ index 0000000..32ac691
 +  bool isAExtLoadInst(llvm::MachineInstr *MI) const;
 +  bool isStoreInst(llvm::MachineInstr *MI) const;
 +  bool isTruncStoreInst(llvm::MachineInstr *MI) const;
++  bool isRegisterStore(const MachineInstr &MI) const;
++  bool isRegisterLoad(const MachineInstr &MI) const;
++
++//===---------------------------------------------------------------------===//
++// Pure virtual funtions to be implemented by sub-classes.
++//===---------------------------------------------------------------------===//
 +
 +  virtual MachineInstr* getMovImmInstr(MachineFunction *MF, unsigned DstReg,
 +                                       int64_t Imm) const = 0;
 +  virtual unsigned getIEQOpcode() const = 0;
 +  virtual bool isMov(unsigned opcode) const = 0;
 +
++  /// \returns the smallest register index that will be accessed by an indirect
++  /// read or write or -1 if indirect addressing is not used by this program.
++  virtual int getIndirectIndexBegin(const MachineFunction &MF) const = 0;
++
++  /// \returns the largest register index that will be accessed by an indirect
++  /// read or write or -1 if indirect addressing is not used by this program.
++  virtual int getIndirectIndexEnd(const MachineFunction &MF) const = 0;
++
++  /// \brief Calculate the "Indirect Address" for the given \p RegIndex and
++  ///        \p Channel
++  ///
++  /// We model indirect addressing using a virtual address space that can be
++  /// accesed with loads and stores.  The "Indirect Address" is the memory
++  /// address in this virtual address space that maps to the given \p RegIndex
++  /// and \p Channel.
++  virtual unsigned calculateIndirectAddress(unsigned RegIndex,
++                                            unsigned Channel) const = 0;
++
++  /// \returns The register class to be used for storing values to an
++  /// "Indirect Address" .
++  virtual const TargetRegisterClass *getIndirectAddrStoreRegClass(
++                                                  unsigned SourceReg) const = 0;
++
++  /// \returns The register class to be used for loading values from
++  /// an "Indirect Address" .
++  virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const = 0;
++
++  /// \brief Build instruction(s) for an indirect register write.
++  ///
++  /// \returns The instruction that performs the indirect register write
++  virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB,
++                                    MachineBasicBlock::iterator I,
++                                    unsigned ValueReg, unsigned Address,
++                                    unsigned OffsetReg) const = 0;
++
++  /// \brief Build instruction(s) for an indirect register read.
++  ///
++  /// \returns The instruction that performs the indirect register read
++  virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB,
++                                    MachineBasicBlock::iterator I,
++                                    unsigned ValueReg, unsigned Address,
++                                    unsigned OffsetReg) const = 0;
++
++  /// \returns the register class whose sub registers are the set of all
++  /// possible registers that can be used for indirect addressing.
++  virtual const TargetRegisterClass *getSuperIndirectRegClass() const = 0;
++
++
 +  /// \brief Convert the AMDIL MachineInstr to a supported ISA
 +  /// MachineInstr
 +  virtual void convertToISA(MachineInstr & MI, MachineFunction &MF,
@@ -1933,13 +2005,16 @@ index 0000000..32ac691
 +
 +} // End llvm namespace
 +
++#define AMDGPU_FLAG_REGISTER_LOAD  (UINT64_C(1) << 63)
++#define AMDGPU_FLAG_REGISTER_STORE (UINT64_C(1) << 62)
++
 +#endif // AMDGPUINSTRINFO_H
 diff --git a/lib/Target/R600/AMDGPUInstrInfo.td b/lib/Target/R600/AMDGPUInstrInfo.td
 new file mode 100644
-index 0000000..96368e8
+index 0000000..b66ae87
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPUInstrInfo.td
-@@ -0,0 +1,74 @@
+@@ -0,0 +1,82 @@
 +//===-- AMDGPUInstrInfo.td - AMDGPU DAG nodes --------------*- tablegen -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -2014,12 +2089,20 @@ index 0000000..96368e8
 +def AMDGPUurecip : SDNode<"AMDGPUISD::URECIP", SDTIntUnaryOp>;
 +
 +def fpow : SDNode<"ISD::FPOW", SDTFPBinOp>;
++
++def AMDGPUregister_load : SDNode<"AMDGPUISD::REGISTER_LOAD",
++                          SDTypeProfile<1, 2, [SDTCisPtrTy<1>, SDTCisInt<2>]>,
++                          [SDNPHasChain, SDNPMayLoad]>;
++
++def AMDGPUregister_store : SDNode<"AMDGPUISD::REGISTER_STORE",
++                           SDTypeProfile<0, 3, [SDTCisPtrTy<1>, SDTCisInt<2>]>,
++                           [SDNPHasChain, SDNPMayStore]>;
 diff --git a/lib/Target/R600/AMDGPUInstructions.td b/lib/Target/R600/AMDGPUInstructions.td
 new file mode 100644
-index 0000000..e634d20
+index 0000000..0559a5a
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPUInstructions.td
-@@ -0,0 +1,190 @@
+@@ -0,0 +1,268 @@
 +//===-- AMDGPUInstructions.td - Common instruction defs ---*- tablegen -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -2035,8 +2118,8 @@ index 0000000..e634d20
 +//===----------------------------------------------------------------------===//
 +
 +class AMDGPUInst <dag outs, dag ins, string asm, list<dag> pattern> : Instruction {
-+  field bits<16> AMDILOp = 0;
-+  field bits<3> Gen = 0;
++  field bit isRegisterLoad = 0;
++  field bit isRegisterStore = 0;
 +
 +  let Namespace = "AMDGPU";
 +  let OutOperandList = outs;
@@ -2044,8 +2127,9 @@ index 0000000..e634d20
 +  let AsmString = asm;
 +  let Pattern = pattern;
 +  let Itinerary = NullALU;
-+  let TSFlags{42-40} = Gen;
-+  let TSFlags{63-48} = AMDILOp;
++
++  let TSFlags{63} = isRegisterLoad;
++  let TSFlags{62} = isRegisterStore;
 +}
 +
 +class AMDGPUShaderInst <dag outs, dag ins, string asm, list<dag> pattern>
@@ -2123,7 +2207,9 @@ index 0000000..e634d20
 +  [{return N->isExactlyValue(1.0);}]
 +>;
 +
-+let isCodeGenOnly = 1, isPseudo = 1, usesCustomInserter = 1  in {
++let isCodeGenOnly = 1, isPseudo = 1 in {
++
++let usesCustomInserter = 1  in {
 +
 +class CLAMP <RegisterClass rc> : AMDGPUShaderInst <
 +  (outs rc:$dst),
@@ -2153,7 +2239,31 @@ index 0000000..e634d20
 +  [(int_AMDGPU_shader_type imm:$type)]
 +>;
 +
-+} // End isCodeGenOnly = 1, isPseudo = 1, hasCustomInserter = 1
++} // usesCustomInserter = 1
++
++multiclass RegisterLoadStore <RegisterClass dstClass, Operand addrClass,
++                    ComplexPattern addrPat> {
++  def RegisterLoad : AMDGPUShaderInst <
++    (outs dstClass:$dst),
++    (ins addrClass:$addr, i32imm:$chan),
++    "RegisterLoad $dst, $addr",
++    [(set (i32 dstClass:$dst), (AMDGPUregister_load addrPat:$addr,
++                                                    (i32 timm:$chan)))]
++  > {
++    let isRegisterLoad = 1;
++  }
++
++  def RegisterStore : AMDGPUShaderInst <
++    (outs),
++    (ins dstClass:$val, addrClass:$addr, i32imm:$chan),
++    "RegisterStore $val, $addr",
++    [(AMDGPUregister_store (i32 dstClass:$val), addrPat:$addr, (i32 timm:$chan))]
++  > {
++    let isRegisterStore = 1;
++  }
++}
++
++} // End isCodeGenOnly = 1, isPseudo = 1
 +
 +/* Generic helper patterns for intrinsics */
 +/* -------------------------------------- */
@@ -2186,13 +2296,64 @@ index 0000000..e634d20
 +>;
 +
 +// Vector Build pattern
++class Vector1_Build <ValueType vecType, RegisterClass vectorClass,
++                     ValueType elemType, RegisterClass elemClass> : Pat <
++  (vecType (build_vector (elemType elemClass:$src))),
++  (vecType elemClass:$src)
++>;
++
++class Vector2_Build <ValueType vecType, RegisterClass vectorClass,
++                     ValueType elemType, RegisterClass elemClass> : Pat <
++  (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1))),
++  (INSERT_SUBREG (INSERT_SUBREG
++  (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1)
++>;
++
 +class Vector_Build <ValueType vecType, RegisterClass vectorClass,
 +                    ValueType elemType, RegisterClass elemClass> : Pat <
 +  (vecType (build_vector (elemType elemClass:$x), (elemType elemClass:$y),
 +                         (elemType elemClass:$z), (elemType elemClass:$w))),
 +  (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
-+  (vecType (IMPLICIT_DEF)), elemClass:$x, sel_x), elemClass:$y, sel_y),
-+                            elemClass:$z, sel_z), elemClass:$w, sel_w)
++  (vecType (IMPLICIT_DEF)), elemClass:$x, sub0), elemClass:$y, sub1),
++                            elemClass:$z, sub2), elemClass:$w, sub3)
++>;
++
++class Vector8_Build <ValueType vecType, RegisterClass vectorClass,
++                     ValueType elemType, RegisterClass elemClass> : Pat <
++  (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1),
++                         (elemType elemClass:$sub2), (elemType elemClass:$sub3),
++                         (elemType elemClass:$sub4), (elemType elemClass:$sub5),
++                         (elemType elemClass:$sub6), (elemType elemClass:$sub7))),
++  (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++  (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++  (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1),
++                            elemClass:$sub2, sub2), elemClass:$sub3, sub3),
++                            elemClass:$sub4, sub4), elemClass:$sub5, sub5),
++                            elemClass:$sub6, sub6), elemClass:$sub7, sub7)
++>;
++
++class Vector16_Build <ValueType vecType, RegisterClass vectorClass,
++                      ValueType elemType, RegisterClass elemClass> : Pat <
++  (vecType (build_vector (elemType elemClass:$sub0), (elemType elemClass:$sub1),
++                         (elemType elemClass:$sub2), (elemType elemClass:$sub3),
++                         (elemType elemClass:$sub4), (elemType elemClass:$sub5),
++                         (elemType elemClass:$sub6), (elemType elemClass:$sub7),
++                         (elemType elemClass:$sub8), (elemType elemClass:$sub9),
++                         (elemType elemClass:$sub10), (elemType elemClass:$sub11),
++                         (elemType elemClass:$sub12), (elemType elemClass:$sub13),
++                         (elemType elemClass:$sub14), (elemType elemClass:$sub15))),
++  (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++  (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++  (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++  (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG
++  (vecType (IMPLICIT_DEF)), elemClass:$sub0, sub0), elemClass:$sub1, sub1),
++                            elemClass:$sub2, sub2), elemClass:$sub3, sub3),
++                            elemClass:$sub4, sub4), elemClass:$sub5, sub5),
++                            elemClass:$sub6, sub6), elemClass:$sub7, sub7),
++                            elemClass:$sub8, sub8), elemClass:$sub9, sub9),
++                            elemClass:$sub10, sub10), elemClass:$sub11, sub11),
++                            elemClass:$sub12, sub12), elemClass:$sub13, sub13),
++                            elemClass:$sub14, sub14), elemClass:$sub15, sub15)
 +>;
 +
 +// bitconvert pattern
@@ -2409,10 +2570,10 @@ index 0000000..d7d538e
 +#endif //AMDGPU_MCINSTLOWER_H
 diff --git a/lib/Target/R600/AMDGPURegisterInfo.cpp b/lib/Target/R600/AMDGPURegisterInfo.cpp
 new file mode 100644
-index 0000000..eeafec8
+index 0000000..d62e57b
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPURegisterInfo.cpp
-@@ -0,0 +1,51 @@
+@@ -0,0 +1,74 @@
 +//===-- AMDGPURegisterInfo.cpp - AMDGPU Register Information -------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -2462,14 +2623,37 @@ index 0000000..eeafec8
 +  return 0;
 +}
 +
++unsigned AMDGPURegisterInfo::getIndirectSubReg(unsigned IndirectIndex) const {
++
++  switch(IndirectIndex) {
++  case 0: return AMDGPU::sub0;
++  case 1: return AMDGPU::sub1;
++  case 2: return AMDGPU::sub2;
++  case 3: return AMDGPU::sub3;
++  case 4: return AMDGPU::sub4;
++  case 5: return AMDGPU::sub5;
++  case 6: return AMDGPU::sub6;
++  case 7: return AMDGPU::sub7;
++  case 8: return AMDGPU::sub8;
++  case 9: return AMDGPU::sub9;
++  case 10: return AMDGPU::sub10;
++  case 11: return AMDGPU::sub11;
++  case 12: return AMDGPU::sub12;
++  case 13: return AMDGPU::sub13;
++  case 14: return AMDGPU::sub14;
++  case 15: return AMDGPU::sub15;
++  default: llvm_unreachable("indirect index out of range");
++  }
++}
++
 +#define GET_REGINFO_TARGET_DESC
 +#include "AMDGPUGenRegisterInfo.inc"
 diff --git a/lib/Target/R600/AMDGPURegisterInfo.h b/lib/Target/R600/AMDGPURegisterInfo.h
 new file mode 100644
-index 0000000..76ee7ae
+index 0000000..5007ff5
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPURegisterInfo.h
-@@ -0,0 +1,63 @@
+@@ -0,0 +1,65 @@
 +//===-- AMDGPURegisterInfo.h - AMDGPURegisterInfo Interface -*- C++ -*-----===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -2528,6 +2712,8 @@ index 0000000..76ee7ae
 +                           RegScavenger *RS) const;
 +  unsigned getFrameRegister(const MachineFunction &MF) const;
 +
++  unsigned getIndirectSubReg(unsigned IndirectIndex) const;
++
 +};
 +
 +} // End namespace llvm
@@ -2535,10 +2721,10 @@ index 0000000..76ee7ae
 +#endif // AMDIDSAREGISTERINFO_H
 diff --git a/lib/Target/R600/AMDGPURegisterInfo.td b/lib/Target/R600/AMDGPURegisterInfo.td
 new file mode 100644
-index 0000000..8181e02
+index 0000000..b5aca03
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPURegisterInfo.td
-@@ -0,0 +1,22 @@
+@@ -0,0 +1,25 @@
 +//===-- AMDGPURegisterInfo.td - AMDGPU register info -------*- tablegen -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -2553,20 +2739,23 @@ index 0000000..8181e02
 +//===----------------------------------------------------------------------===//
 +
 +let Namespace = "AMDGPU" in {
-+  def sel_x : SubRegIndex;
-+  def sel_y : SubRegIndex;
-+  def sel_z : SubRegIndex;
-+  def sel_w : SubRegIndex;
++
++foreach Index = 0-15 in {
++  def sub#Index : SubRegIndex;
++}
++
++def INDIRECT_BASE_ADDR : Register <"INDIRECT_BASE_ADDR">;
++
 +}
 +
 +include "R600RegisterInfo.td"
 +include "SIRegisterInfo.td"
 diff --git a/lib/Target/R600/AMDGPUStructurizeCFG.cpp b/lib/Target/R600/AMDGPUStructurizeCFG.cpp
 new file mode 100644
-index 0000000..22338b5
+index 0000000..a8c9621
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPUStructurizeCFG.cpp
-@@ -0,0 +1,714 @@
+@@ -0,0 +1,893 @@
 +//===-- AMDGPUStructurizeCFG.cpp -  ------------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -2591,30 +2780,101 @@ index 0000000..22338b5
 +#include "llvm/Analysis/RegionInfo.h"
 +#include "llvm/Analysis/RegionPass.h"
 +#include "llvm/Transforms/Utils/SSAUpdater.h"
++#include "llvm/Support/PatternMatch.h"
 +
 +using namespace llvm;
++using namespace llvm::PatternMatch;
 +
 +namespace {
 +
 +// Definition of the complex types used in this pass.
 +
 +typedef std::pair<BasicBlock *, Value *> BBValuePair;
-+typedef ArrayRef<BasicBlock*> BBVecRef;
 +
 +typedef SmallVector<RegionNode*, 8> RNVector;
 +typedef SmallVector<BasicBlock*, 8> BBVector;
++typedef SmallVector<BranchInst*, 8> BranchVector;
 +typedef SmallVector<BBValuePair, 2> BBValueVector;
 +
++typedef SmallPtrSet<BasicBlock *, 8> BBSet;
++
 +typedef DenseMap<PHINode *, BBValueVector> PhiMap;
++typedef DenseMap<DomTreeNode *, unsigned> DTN2UnsignedMap;
 +typedef DenseMap<BasicBlock *, PhiMap> BBPhiMap;
 +typedef DenseMap<BasicBlock *, Value *> BBPredicates;
 +typedef DenseMap<BasicBlock *, BBPredicates> PredMap;
-+typedef DenseMap<BasicBlock *, unsigned> VisitedMap;
++typedef DenseMap<BasicBlock *, BasicBlock*> BB2BBMap;
++typedef DenseMap<BasicBlock *, BBVector> BB2BBVecMap;
 +
 +// The name for newly created blocks.
 +
 +static const char *FlowBlockName = "Flow";
 +
++/// @brief Find the nearest common dominator for multiple BasicBlocks
++///
++/// Helper class for AMDGPUStructurizeCFG
++/// TODO: Maybe move into common code
++class NearestCommonDominator {
++
++  DominatorTree *DT;
++
++  DTN2UnsignedMap IndexMap;
++
++  BasicBlock *Result;
++  unsigned ResultIndex;
++  bool ExplicitMentioned;
++
++public:
++  /// \brief Start a new query
++  NearestCommonDominator(DominatorTree *DomTree) {
++    DT = DomTree;
++    Result = 0;
++  }
++
++  /// \brief Add BB to the resulting dominator
++  void addBlock(BasicBlock *BB, bool Remember = true) {
++
++    DomTreeNode *Node = DT->getNode(BB);
++
++    if (Result == 0) {
++      unsigned Numbering = 0;
++      for (;Node;Node = Node->getIDom())
++        IndexMap[Node] = ++Numbering;
++      Result = BB;
++      ResultIndex = 1;
++      ExplicitMentioned = Remember;
++      return;
++    }
++
++    for (;Node;Node = Node->getIDom())
++      if (IndexMap.count(Node))
++        break;
++      else
++        IndexMap[Node] = 0;
++
++    assert(Node && "Dominator tree invalid!");
++
++    unsigned Numbering = IndexMap[Node];
++    if (Numbering > ResultIndex) {
++      Result = Node->getBlock();
++      ResultIndex = Numbering;
++      ExplicitMentioned = Remember && (Result == BB);
++    } else if (Numbering == ResultIndex) {
++      ExplicitMentioned |= Remember;
++    }
++  }
++
++  /// \brief Is "Result" one of the BBs added with "Remember" = True?
++  bool wasResultExplicitMentioned() {
++    return ExplicitMentioned;
++  }
++
++  /// \brief Get the query result
++  BasicBlock *getResult() {
++    return Result;
++  }
++};
++
 +/// @brief Transforms the control flow graph on one single entry/exit region
 +/// at a time.
 +///
@@ -2675,45 +2935,62 @@ index 0000000..22338b5
 +  DominatorTree *DT;
 +
 +  RNVector Order;
-+  VisitedMap Visited;
-+  PredMap Predicates;
++  BBSet Visited;
++
 +  BBPhiMap DeletedPhis;
-+  BBVector FlowsInserted;
++  BB2BBVecMap AddedPhis;
++
++  PredMap Predicates;
++  BranchVector Conditions;
++
++  BB2BBMap Loops;
++  PredMap LoopPreds;
++  BranchVector LoopConds;
 +
-+  BasicBlock *LoopStart;
-+  BasicBlock *LoopEnd;
-+  BBPredicates LoopPred;
++  RegionNode *PrevNode;
 +
 +  void orderNodes();
 +
-+  void buildPredicate(BranchInst *Term, unsigned Idx,
-+                      BBPredicates &Pred, bool Invert);
++  void analyzeLoops(RegionNode *N);
 +
-+  void analyzeBlock(BasicBlock *BB);
++  Value *invert(Value *Condition);
 +
-+  void analyzeLoop(BasicBlock *BB, unsigned &LoopIdx);
++  Value *buildCondition(BranchInst *Term, unsigned Idx, bool Invert);
++
++  void gatherPredicates(RegionNode *N);
 +
 +  void collectInfos();
 +
-+  bool dominatesPredicates(BasicBlock *A, BasicBlock *B);
++  void insertConditions(bool Loops);
++
++  void delPhiValues(BasicBlock *From, BasicBlock *To);
++
++  void addPhiValues(BasicBlock *From, BasicBlock *To);
++
++  void setPhiValues();
 +
 +  void killTerminator(BasicBlock *BB);
 +
-+  RegionNode *skipChained(RegionNode *Node);
++  void changeExit(RegionNode *Node, BasicBlock *NewExit,
++                  bool IncludeDominator);
 +
-+  void delPhiValues(BasicBlock *From, BasicBlock *To);
++  BasicBlock *getNextFlow(BasicBlock *Dominator);
 +
-+  void addPhiValues(BasicBlock *From, BasicBlock *To);
++  BasicBlock *needPrefix(bool NeedEmpty);
 +
-+  BasicBlock *getNextFlow(BasicBlock *Prev);
++  BasicBlock *needPostfix(BasicBlock *Flow, bool ExitUseAllowed);
 +
-+  bool isPredictableTrue(BasicBlock *Prev, BasicBlock *Node);
++  void setPrevNode(BasicBlock *BB);
 +
-+  BasicBlock *wireFlowBlock(BasicBlock *Prev, RegionNode *Node);
++  bool dominatesPredicates(BasicBlock *BB, RegionNode *Node);
 +
-+  void createFlow();
++  bool isPredictableTrue(RegionNode *Node);
++
++  void wireFlow(bool ExitUseAllowed, BasicBlock *LoopEnd);
 +
-+  void insertConditions();
++  void handleLoops(bool ExitUseAllowed, BasicBlock *LoopEnd);
++
++  void createFlow();
 +
 +  void rebuildSSA();
 +
@@ -2767,212 +3044,214 @@ index 0000000..22338b5
 +  }
 +}
 +
-+/// \brief Build blocks and loop predicates
-+void AMDGPUStructurizeCFG::buildPredicate(BranchInst *Term, unsigned Idx,
-+                                          BBPredicates &Pred, bool Invert) {
-+  Value *True = Invert ? BoolFalse : BoolTrue;
-+  Value *False = Invert ? BoolTrue : BoolFalse;
++/// \brief Determine the end of the loops
++void AMDGPUStructurizeCFG::analyzeLoops(RegionNode *N) {
 +
-+  RegionInfo *RI = ParentRegion->getRegionInfo();
-+  BasicBlock *BB = Term->getParent();
++  if (N->isSubRegion()) {
++    // Test for exit as back edge
++    BasicBlock *Exit = N->getNodeAs<Region>()->getExit();
++    if (Visited.count(Exit))
++      Loops[Exit] = N->getEntry();
++
++  } else {
++    // Test for sucessors as back edge
++    BasicBlock *BB = N->getNodeAs<BasicBlock>();
++    BranchInst *Term = cast<BranchInst>(BB->getTerminator());
 +
-+  // Handle the case where multiple regions start at the same block
-+  Region *R = BB != ParentRegion->getEntry() ?
-+              RI->getRegionFor(BB) : ParentRegion;
++    for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
++      BasicBlock *Succ = Term->getSuccessor(i);
 +
-+  if (R == ParentRegion) {
-+    // It's a top level block in our region
-+    Value *Cond = True;
-+    if (Term->isConditional()) {
-+      BasicBlock *Other = Term->getSuccessor(!Idx);
++      if (Visited.count(Succ))
++        Loops[Succ] = BB;
++    }
++  }
++}
 +
-+      if (Visited.count(Other)) {
-+        if (!Pred.count(Other))
-+          Pred[Other] = False;
++/// \brief Invert the given condition
++Value *AMDGPUStructurizeCFG::invert(Value *Condition) {
 +
-+        if (!Pred.count(BB))
-+          Pred[BB] = True;
-+        return;
-+      }
-+      Cond = Term->getCondition();
++  // First: Check if it's a constant
++  if (Condition == BoolTrue)
++    return BoolFalse;
 +
-+      if (Idx != Invert)
-+        Cond = BinaryOperator::CreateNot(Cond, "", Term);
-+    }
++  if (Condition == BoolFalse)
++    return BoolTrue;
 +
-+    Pred[BB] = Cond;
++  if (Condition == BoolUndef)
++    return BoolUndef;
 +
-+  } else if (ParentRegion->contains(R)) {
-+    // It's a block in a sub region
-+    while(R->getParent() != ParentRegion)
-+      R = R->getParent();
++  // Second: If the condition is already inverted, return the original value
++  if (match(Condition, m_Not(m_Value(Condition))))
++    return Condition;
 +
-+    Pred[R->getEntry()] = True;
++  // Third: Check all the users for an invert
++  BasicBlock *Parent = cast<Instruction>(Condition)->getParent();
++  for (Value::use_iterator I = Condition->use_begin(),
++       E = Condition->use_end(); I != E; ++I) {
 +
-+  } else {
-+    // It's a branch from outside into our parent region
-+    Pred[BB] = True;
++    Instruction *User = dyn_cast<Instruction>(*I);
++    if (!User || User->getParent() != Parent)
++      continue;
++
++    if (match(*I, m_Not(m_Specific(Condition))))
++      return *I;
 +  }
-+}
 +
-+/// \brief Analyze the successors of each block and build up predicates
-+void AMDGPUStructurizeCFG::analyzeBlock(BasicBlock *BB) {
-+  pred_iterator PI = pred_begin(BB), PE = pred_end(BB);
-+  BBPredicates &Pred = Predicates[BB];
++  // Last option: Create a new instruction
++  return BinaryOperator::CreateNot(Condition, "", Parent->getTerminator());
++}
 +
-+  for (; PI != PE; ++PI) {
-+    BranchInst *Term = cast<BranchInst>((*PI)->getTerminator());
++/// \brief Build the condition for one edge
++Value *AMDGPUStructurizeCFG::buildCondition(BranchInst *Term, unsigned Idx,
++                                            bool Invert) {
++  Value *Cond = Invert ? BoolFalse : BoolTrue;
++  if (Term->isConditional()) {
++    Cond = Term->getCondition();
 +
-+    for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
-+      BasicBlock *Succ = Term->getSuccessor(i);
-+      if (Succ != BB)
-+        continue;
-+      buildPredicate(Term, i, Pred, false);
-+    }
++    if (Idx != Invert)
++      Cond = invert(Cond);
 +  }
++  return Cond;
 +}
 +
-+/// \brief Analyze the conditions leading to loop to a previous block
-+void AMDGPUStructurizeCFG::analyzeLoop(BasicBlock *BB, unsigned &LoopIdx) {
-+  BranchInst *Term = cast<BranchInst>(BB->getTerminator());
++/// \brief Analyze the predecessors of each block and build up predicates
++void AMDGPUStructurizeCFG::gatherPredicates(RegionNode *N) {
 +
-+  for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
-+    BasicBlock *Succ = Term->getSuccessor(i);
++  RegionInfo *RI = ParentRegion->getRegionInfo();
++  BasicBlock *BB = N->getEntry();
++  BBPredicates &Pred = Predicates[BB];
++  BBPredicates &LPred = LoopPreds[BB];
++
++  for (pred_iterator PI = pred_begin(BB), PE = pred_end(BB);
++       PI != PE; ++PI) {
 +
-+    // Ignore it if it's not a back edge
-+    if (!Visited.count(Succ))
++    // Ignore it if it's a branch from outside into our region entry
++    if (!ParentRegion->contains(*PI))
 +      continue;
 +
-+    buildPredicate(Term, i, LoopPred, true);
++    Region *R = RI->getRegionFor(*PI);
++    if (R == ParentRegion) {
++
++      // It's a top level block in our region
++      BranchInst *Term = cast<BranchInst>((*PI)->getTerminator());
++      for (unsigned i = 0, e = Term->getNumSuccessors(); i != e; ++i) {
++        BasicBlock *Succ = Term->getSuccessor(i);
++        if (Succ != BB)
++          continue;
++
++        if (Visited.count(*PI)) {
++          // Normal forward edge
++          if (Term->isConditional()) {
++            // Try to treat it like an ELSE block
++            BasicBlock *Other = Term->getSuccessor(!i);
++            if (Visited.count(Other) && !Loops.count(Other) &&
++                !Pred.count(Other) && !Pred.count(*PI)) {
++                
++              Pred[Other] = BoolFalse;
++              Pred[*PI] = BoolTrue;
++              continue;
++            }
++          }
++          Pred[*PI] = buildCondition(Term, i, false);
++ 
++        } else {
++          // Back edge
++          LPred[*PI] = buildCondition(Term, i, true);
++        }
++      }
++
++    } else {
++
++      // It's an exit from a sub region
++      while(R->getParent() != ParentRegion)
++        R = R->getParent();
++
++      // Edge from inside a subregion to its entry, ignore it
++      if (R == N)
++        continue;
 +
-+    LoopEnd = BB;
-+    if (Visited[Succ] < LoopIdx) {
-+      LoopIdx = Visited[Succ];
-+      LoopStart = Succ;
++      BasicBlock *Entry = R->getEntry();
++      if (Visited.count(Entry))
++        Pred[Entry] = BoolTrue;
++      else
++        LPred[Entry] = BoolFalse;
 +    }
 +  }
 +}
 +
 +/// \brief Collect various loop and predicate infos
 +void AMDGPUStructurizeCFG::collectInfos() {
-+  unsigned Number = 0, LoopIdx = ~0;
 +
 +  // Reset predicate
 +  Predicates.clear();
 +
 +  // and loop infos
-+  LoopStart = LoopEnd = 0;
-+  LoopPred.clear();
++  Loops.clear();
++  LoopPreds.clear();
++
++  // Reset the visited nodes
++  Visited.clear();
 +
-+  RNVector::reverse_iterator OI = Order.rbegin(), OE = Order.rend();
-+  for (Visited.clear(); OI != OE; Visited[(*OI++)->getEntry()] = ++Number) {
++  for (RNVector::reverse_iterator OI = Order.rbegin(), OE = Order.rend();
++       OI != OE; ++OI) {
 +
 +    // Analyze all the conditions leading to a node
-+    analyzeBlock((*OI)->getEntry());
++    gatherPredicates(*OI);
 +
-+    if ((*OI)->isSubRegion())
-+      continue;
++    // Remember that we've seen this node
++    Visited.insert((*OI)->getEntry());
 +
-+    // Find the first/last loop nodes and loop predicates
-+    analyzeLoop((*OI)->getNodeAs<BasicBlock>(), LoopIdx);
++    // Find the last back edges
++    analyzeLoops(*OI);
 +  }
 +}
 +
-+/// \brief Does A dominate all the predicates of B ?
-+bool AMDGPUStructurizeCFG::dominatesPredicates(BasicBlock *A, BasicBlock *B) {
-+  BBPredicates &Preds = Predicates[B];
-+  for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
-+       PI != PE; ++PI) {
++/// \brief Insert the missing branch conditions
++void AMDGPUStructurizeCFG::insertConditions(bool Loops) {
++  BranchVector &Conds = Loops ? LoopConds : Conditions;
++  Value *Default = Loops ? BoolTrue : BoolFalse;
++  SSAUpdater PhiInserter;
 +
-+    if (!DT->dominates(A, PI->first))
-+      return false;
-+  }
-+  return true;
-+}
++  for (BranchVector::iterator I = Conds.begin(),
++       E = Conds.end(); I != E; ++I) {
 +
-+/// \brief Remove phi values from all successors and the remove the terminator.
-+void AMDGPUStructurizeCFG::killTerminator(BasicBlock *BB) {
-+  TerminatorInst *Term = BB->getTerminator();
-+  if (!Term)
-+    return;
++    BranchInst *Term = *I;
++    assert(Term->isConditional());
 +
-+  for (succ_iterator SI = succ_begin(BB), SE = succ_end(BB);
-+       SI != SE; ++SI) {
++    BasicBlock *Parent = Term->getParent();
++    BasicBlock *SuccTrue = Term->getSuccessor(0);
++    BasicBlock *SuccFalse = Term->getSuccessor(1);
 +
-+    delPhiValues(BB, *SI);
-+  }
++    PhiInserter.Initialize(Boolean, "");
++    PhiInserter.AddAvailableValue(&Func->getEntryBlock(), Default);
++    PhiInserter.AddAvailableValue(Loops ? SuccFalse : Parent, Default);
 +
-+  Term->eraseFromParent();
-+}
++    BBPredicates &Preds = Loops ? LoopPreds[SuccFalse] : Predicates[SuccTrue];
 +
-+/// First: Skip forward to the first region node that either isn't a subregion or not
-+/// dominating it's exit, remove all the skipped nodes from the node order.
-+///
-+/// Second: Handle the first successor directly if the resulting nodes successor
-+/// predicates are still dominated by the original entry
-+RegionNode *AMDGPUStructurizeCFG::skipChained(RegionNode *Node) {
-+  BasicBlock *Entry = Node->getEntry();
++    NearestCommonDominator Dominator(DT);
++    Dominator.addBlock(Parent, false);
 +
-+  // Skip forward as long as it is just a linear flow
-+  while (true) {
-+    BasicBlock *Entry = Node->getEntry();
-+    BasicBlock *Exit;
++    Value *ParentValue = 0;
++    for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
++         PI != PE; ++PI) {
 +
-+    if (Node->isSubRegion()) {
-+      Exit = Node->getNodeAs<Region>()->getExit();
-+    } else {
-+      TerminatorInst *Term = Entry->getTerminator();
-+      if (Term->getNumSuccessors() != 1)
++      if (PI->first == Parent) {
++        ParentValue = PI->second;
 +        break;
-+      Exit = Term->getSuccessor(0);
++      }
++      PhiInserter.AddAvailableValue(PI->first, PI->second);
++      Dominator.addBlock(PI->first);
 +    }
 +
-+    // It's a back edge, break here so we can insert a loop node
-+    if (!Visited.count(Exit))
-+      return Node;
-+
-+    // More than node edges are pointing to exit
-+    if (!DT->dominates(Entry, Exit))
-+      return Node;
-+
-+    RegionNode *Next = ParentRegion->getNode(Exit);
-+    RNVector::iterator I = std::find(Order.begin(), Order.end(), Next);
-+    assert(I != Order.end());
-+
-+    Visited.erase(Next->getEntry());
-+    Order.erase(I);
-+    Node = Next;
-+  }
++    if (ParentValue) {
++      Term->setCondition(ParentValue);
++    } else {
++      if (!Dominator.wasResultExplicitMentioned())
++        PhiInserter.AddAvailableValue(Dominator.getResult(), Default);
 +
-+  BasicBlock *BB = Node->getEntry();
-+  TerminatorInst *Term = BB->getTerminator();
-+  if (Term->getNumSuccessors() != 2)
-+    return Node;
-+
-+  // Our node has exactly two succesors, check if we can handle
-+  // any of them directly
-+  BasicBlock *Succ = Term->getSuccessor(0);
-+  if (!Visited.count(Succ) || !dominatesPredicates(Entry, Succ)) {
-+    Succ = Term->getSuccessor(1);
-+    if (!Visited.count(Succ) || !dominatesPredicates(Entry, Succ))
-+      return Node;
-+  } else {
-+    BasicBlock *Succ2 = Term->getSuccessor(1);
-+    if (Visited.count(Succ2) && Visited[Succ] > Visited[Succ2] &&
-+        dominatesPredicates(Entry, Succ2))
-+      Succ = Succ2;
++      Term->setCondition(PhiInserter.GetValueInMiddleOfBlock(Parent));
++    }
 +  }
-+
-+  RegionNode *Next = ParentRegion->getNode(Succ);
-+  RNVector::iterator E = Order.end();
-+  RNVector::iterator I = std::find(Order.begin(), E, Next);
-+  assert(I != E);
-+
-+  killTerminator(BB);
-+  FlowsInserted.push_back(BB);
-+  Visited.erase(Succ);
-+  Order.erase(I);
-+  return ParentRegion->getNode(wireFlowBlock(BB, Next));
 +}
 +
 +/// \brief Remove all PHI values coming from "From" into "To" and remember
@@ -2990,224 +3269,306 @@ index 0000000..22338b5
 +  }
 +}
 +
-+/// \brief Add the PHI values back once we knew the new predecessor
++/// \brief Add a dummy PHI value as soon as we knew the new predecessor
 +void AMDGPUStructurizeCFG::addPhiValues(BasicBlock *From, BasicBlock *To) {
-+  if (!DeletedPhis.count(To))
-+    return;
++  for (BasicBlock::iterator I = To->begin(), E = To->end();
++       I != E && isa<PHINode>(*I);) {
 +
-+  PhiMap &Map = DeletedPhis[To];
++    PHINode &Phi = cast<PHINode>(*I++);
++    Value *Undef = UndefValue::get(Phi.getType());
++    Phi.addIncoming(Undef, From);
++  }
++  AddedPhis[To].push_back(From);
++}
++
++/// \brief Add the real PHI value as soon as everything is set up
++void AMDGPUStructurizeCFG::setPhiValues() {
++  
 +  SSAUpdater Updater;
++  for (BB2BBVecMap::iterator AI = AddedPhis.begin(), AE = AddedPhis.end();
++       AI != AE; ++AI) {
 +
-+  for (PhiMap::iterator I = Map.begin(), E = Map.end(); I != E; ++I) {
++    BasicBlock *To = AI->first;
++    BBVector &From = AI->second;
 +
-+    PHINode *Phi = I->first;
-+    Updater.Initialize(Phi->getType(), "");
-+    BasicBlock *Fallback = To;
-+    bool HaveFallback = false;
++    if (!DeletedPhis.count(To))
++      continue;
 +
-+    for (BBValueVector::iterator VI = I->second.begin(), VE = I->second.end();
-+         VI != VE; ++VI) {
++    PhiMap &Map = DeletedPhis[To];
++    for (PhiMap::iterator PI = Map.begin(), PE = Map.end();
++         PI != PE; ++PI) {
 +
-+      Updater.AddAvailableValue(VI->first, VI->second);
-+      BasicBlock *Dom = DT->findNearestCommonDominator(Fallback, VI->first);
-+      if (Dom == VI->first)
-+        HaveFallback = true;
-+      else if (Dom != Fallback)
-+        HaveFallback = false;
-+      Fallback = Dom;
-+    }
-+    if (!HaveFallback) {
++      PHINode *Phi = PI->first;
 +      Value *Undef = UndefValue::get(Phi->getType());
-+      Updater.AddAvailableValue(Fallback, Undef);
++      Updater.Initialize(Phi->getType(), "");
++      Updater.AddAvailableValue(&Func->getEntryBlock(), Undef);
++      Updater.AddAvailableValue(To, Undef);
++
++      NearestCommonDominator Dominator(DT);
++      Dominator.addBlock(To, false);
++      for (BBValueVector::iterator VI = PI->second.begin(),
++           VE = PI->second.end(); VI != VE; ++VI) {
++
++        Updater.AddAvailableValue(VI->first, VI->second);
++        Dominator.addBlock(VI->first);
++      }
++
++      if (!Dominator.wasResultExplicitMentioned())
++        Updater.AddAvailableValue(Dominator.getResult(), Undef);
++
++      for (BBVector::iterator FI = From.begin(), FE = From.end();
++           FI != FE; ++FI) {
++
++        int Idx = Phi->getBasicBlockIndex(*FI);
++        assert(Idx != -1);
++        Phi->setIncomingValue(Idx, Updater.GetValueAtEndOfBlock(*FI));
++      }
++    }
++
++    DeletedPhis.erase(To);
++  }
++  assert(DeletedPhis.empty());
++}
++
++/// \brief Remove phi values from all successors and then remove the terminator.
++void AMDGPUStructurizeCFG::killTerminator(BasicBlock *BB) {
++  TerminatorInst *Term = BB->getTerminator();
++  if (!Term)
++    return;
++
++  for (succ_iterator SI = succ_begin(BB), SE = succ_end(BB);
++       SI != SE; ++SI) {
++
++    delPhiValues(BB, *SI);
++  }
++
++  Term->eraseFromParent();
++}
++
++/// \brief Let node exit(s) point to NewExit
++void AMDGPUStructurizeCFG::changeExit(RegionNode *Node, BasicBlock *NewExit,
++                                      bool IncludeDominator) {
++
++  if (Node->isSubRegion()) {
++    Region *SubRegion = Node->getNodeAs<Region>();
++    BasicBlock *OldExit = SubRegion->getExit();
++    BasicBlock *Dominator = 0;
++
++    // Find all the edges from the sub region to the exit
++    for (pred_iterator I = pred_begin(OldExit), E = pred_end(OldExit);
++         I != E;) {
++
++      BasicBlock *BB = *I++;
++      if (!SubRegion->contains(BB))
++        continue;
++
++      // Modify the edges to point to the new exit
++      delPhiValues(BB, OldExit);
++      BB->getTerminator()->replaceUsesOfWith(OldExit, NewExit);
++      addPhiValues(BB, NewExit);
++
++      // Find the new dominator (if requested)
++      if (IncludeDominator) {
++        if (!Dominator)
++          Dominator = BB;
++        else
++          Dominator = DT->findNearestCommonDominator(Dominator, BB);
++      }
 +    }
 +
-+    Phi->addIncoming(Updater.GetValueAtEndOfBlock(From), From);
++    // Change the dominator (if requested)
++    if (Dominator)
++      DT->changeImmediateDominator(NewExit, Dominator);
++
++    // Update the region info
++    SubRegion->replaceExit(NewExit);
++
++  } else {
++    BasicBlock *BB = Node->getNodeAs<BasicBlock>();
++    killTerminator(BB);
++    BranchInst::Create(NewExit, BB);
++    addPhiValues(BB, NewExit);
++    if (IncludeDominator)
++      DT->changeImmediateDominator(NewExit, BB);
 +  }
-+  DeletedPhis.erase(To);
 +}
 +
 +/// \brief Create a new flow node and update dominator tree and region info
-+BasicBlock *AMDGPUStructurizeCFG::getNextFlow(BasicBlock *Prev) {
++BasicBlock *AMDGPUStructurizeCFG::getNextFlow(BasicBlock *Dominator) {
 +  LLVMContext &Context = Func->getContext();
 +  BasicBlock *Insert = Order.empty() ? ParentRegion->getExit() :
 +                       Order.back()->getEntry();
 +  BasicBlock *Flow = BasicBlock::Create(Context, FlowBlockName,
 +                                        Func, Insert);
-+  DT->addNewBlock(Flow, Prev);
++  DT->addNewBlock(Flow, Dominator);
 +  ParentRegion->getRegionInfo()->setRegionFor(Flow, ParentRegion);
-+  FlowsInserted.push_back(Flow);
 +  return Flow;
 +}
 +
++/// \brief Create a new or reuse the previous node as flow node
++BasicBlock *AMDGPUStructurizeCFG::needPrefix(bool NeedEmpty) {
++
++  BasicBlock *Entry = PrevNode->getEntry();
++
++  if (!PrevNode->isSubRegion()) {
++    killTerminator(Entry);
++    if (!NeedEmpty || Entry->getFirstInsertionPt() == Entry->end())
++      return Entry;
++
++  } 
++
++  // create a new flow node
++  BasicBlock *Flow = getNextFlow(Entry);
++
++  // and wire it up
++  changeExit(PrevNode, Flow, true);
++  PrevNode = ParentRegion->getBBNode(Flow);
++  return Flow;
++}
++
++/// \brief Returns the region exit if possible, otherwise just a new flow node
++BasicBlock *AMDGPUStructurizeCFG::needPostfix(BasicBlock *Flow,
++                                              bool ExitUseAllowed) {
++
++  if (Order.empty() && ExitUseAllowed) {
++    BasicBlock *Exit = ParentRegion->getExit();
++    DT->changeImmediateDominator(Exit, Flow);
++    addPhiValues(Flow, Exit);
++    return Exit;
++  }
++  return getNextFlow(Flow);
++}
++
++/// \brief Set the previous node
++void AMDGPUStructurizeCFG::setPrevNode(BasicBlock *BB) {
++  PrevNode =  ParentRegion->contains(BB) ? ParentRegion->getBBNode(BB) : 0;
++}
++
++/// \brief Does BB dominate all the predicates of Node ?
++bool AMDGPUStructurizeCFG::dominatesPredicates(BasicBlock *BB, RegionNode *Node) {
++  BBPredicates &Preds = Predicates[Node->getEntry()];
++  for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
++       PI != PE; ++PI) {
++
++    if (!DT->dominates(BB, PI->first))
++      return false;
++  }
++  return true;
++}
++
 +/// \brief Can we predict that this node will always be called?
-+bool AMDGPUStructurizeCFG::isPredictableTrue(BasicBlock *Prev,
-+                                             BasicBlock *Node) {
-+  BBPredicates &Preds = Predicates[Node];
++bool AMDGPUStructurizeCFG::isPredictableTrue(RegionNode *Node) {
++
++  BBPredicates &Preds = Predicates[Node->getEntry()];
 +  bool Dominated = false;
 +
++  // Regionentry is always true
++  if (PrevNode == 0)
++    return true;
++
 +  for (BBPredicates::iterator I = Preds.begin(), E = Preds.end();
 +       I != E; ++I) {
 +
 +    if (I->second != BoolTrue)
 +      return false;
 +
-+    if (!Dominated && DT->dominates(I->first, Prev))
++    if (!Dominated && DT->dominates(I->first, PrevNode->getEntry()))
 +      Dominated = true;
 +  }
++
++  // TODO: The dominator check is too strict
 +  return Dominated;
 +}
 +
-+/// \brief Wire up the new control flow by inserting or updating the branch
-+/// instructions at node exits
-+BasicBlock *AMDGPUStructurizeCFG::wireFlowBlock(BasicBlock *Prev,
-+                                                RegionNode *Node) {
-+  BasicBlock *Entry = Node->getEntry();
-+
-+  if (LoopStart == Entry) {
-+    LoopStart = Prev;
-+    LoopPred[Prev] = BoolTrue;
-+  }
++/// Take one node from the order vector and wire it up
++void AMDGPUStructurizeCFG::wireFlow(bool ExitUseAllowed,
++                                    BasicBlock *LoopEnd) {
 +
-+  // Wire it up temporary, skipChained may recurse into us
-+  BranchInst::Create(Entry, Prev);
-+  DT->changeImmediateDominator(Entry, Prev);
-+  addPhiValues(Prev, Entry);
++  RegionNode *Node = Order.pop_back_val();
++  Visited.insert(Node->getEntry());
 +
-+  Node = skipChained(Node);
++  if (isPredictableTrue(Node)) {
++    // Just a linear flow
++    if (PrevNode) {
++      changeExit(PrevNode, Node->getEntry(), true);
++    }
++    PrevNode = Node;
 +
-+  BasicBlock *Next = getNextFlow(Prev);
-+  if (!isPredictableTrue(Prev, Entry)) {
-+    // Let Prev point to entry and next block
-+    Prev->getTerminator()->eraseFromParent();
-+    BranchInst::Create(Entry, Next, BoolUndef, Prev);
 +  } else {
-+    DT->changeImmediateDominator(Next, Entry);
-+  }
++    // Insert extra prefix node (or reuse last one)
++    BasicBlock *Flow = needPrefix(false);
 +
-+  // Let node exit(s) point to next block
-+  if (Node->isSubRegion()) {
-+    Region *SubRegion = Node->getNodeAs<Region>();
-+    BasicBlock *Exit = SubRegion->getExit();
++    // Insert extra postfix node (or use exit instead)
++    BasicBlock *Entry = Node->getEntry();
++    BasicBlock *Next = needPostfix(Flow, ExitUseAllowed);
 +
-+    // Find all the edges from the sub region to the exit
-+    BBVector ToDo;
-+    for (pred_iterator I = pred_begin(Exit), E = pred_end(Exit); I != E; ++I) {
-+      if (SubRegion->contains(*I))
-+        ToDo.push_back(*I);
-+    }
++    // let it point to entry and next block
++    Conditions.push_back(BranchInst::Create(Entry, Next, BoolUndef, Flow));
++    addPhiValues(Flow, Entry);
++    DT->changeImmediateDominator(Entry, Flow);
 +
-+    // Modify the edges to point to the new flow block
-+    for (BBVector::iterator I = ToDo.begin(), E = ToDo.end(); I != E; ++I) {
-+      delPhiValues(*I, Exit);
-+      TerminatorInst *Term = (*I)->getTerminator();
-+      Term->replaceUsesOfWith(Exit, Next);
++    PrevNode = Node;
++    while (!Order.empty() && !Visited.count(LoopEnd) &&
++           dominatesPredicates(Entry, Order.back())) {
++      handleLoops(false, LoopEnd);
 +    }
 +
-+    // Update the region info
-+    SubRegion->replaceExit(Next);
-+
-+  } else {
-+    BasicBlock *BB = Node->getNodeAs<BasicBlock>();
-+    killTerminator(BB);
-+    BranchInst::Create(Next, BB);
-+
-+    if (BB == LoopEnd)
-+      LoopEnd = 0;
++    changeExit(PrevNode, Next, false);
++    setPrevNode(Next);
 +  }
-+
-+  return Next;
 +}
 +
-+/// Destroy node order and visited map, build up flow order instead.
-+/// After this function control flow looks like it should be, but
-+/// branches only have undefined conditions.
-+void AMDGPUStructurizeCFG::createFlow() {
-+  DeletedPhis.clear();
-+
-+  BasicBlock *Prev = Order.pop_back_val()->getEntry();
-+  assert(Prev == ParentRegion->getEntry() && "Incorrect node order!");
-+  Visited.erase(Prev);
-+
-+  if (LoopStart == Prev) {
-+    // Loop starts at entry, split entry so that we can predicate it
-+    BasicBlock::iterator Insert = Prev->getFirstInsertionPt();
-+    BasicBlock *Split = Prev->splitBasicBlock(Insert, FlowBlockName);
-+    DT->addNewBlock(Split, Prev);
-+    ParentRegion->getRegionInfo()->setRegionFor(Split, ParentRegion);
-+    Predicates[Split] = Predicates[Prev];
-+    Order.push_back(ParentRegion->getBBNode(Split));
-+    LoopPred[Prev] = BoolTrue;
-+
-+  } else if (LoopStart == Order.back()->getEntry()) {
-+    // Loop starts behind entry, split entry so that we can jump to it
-+    Instruction *Term = Prev->getTerminator();
-+    BasicBlock *Split = Prev->splitBasicBlock(Term, FlowBlockName);
-+    DT->addNewBlock(Split, Prev);
-+    ParentRegion->getRegionInfo()->setRegionFor(Split, ParentRegion);
-+    Prev = Split;
-+  }
-+
-+  killTerminator(Prev);
-+  FlowsInserted.clear();
-+  FlowsInserted.push_back(Prev);
++void AMDGPUStructurizeCFG::handleLoops(bool ExitUseAllowed,
++                                       BasicBlock *LoopEnd) {
++  RegionNode *Node = Order.back();
++  BasicBlock *LoopStart = Node->getEntry();
 +
-+  while (!Order.empty()) {
-+    RegionNode *Node = Order.pop_back_val();
-+    Visited.erase(Node->getEntry());
-+    Prev = wireFlowBlock(Prev, Node);
-+    if (LoopStart && !LoopEnd) {
-+      // Create an extra loop end node
-+      LoopEnd = Prev;
-+      Prev = getNextFlow(LoopEnd);
-+      BranchInst::Create(Prev, LoopStart, BoolUndef, LoopEnd);
-+      addPhiValues(LoopEnd, LoopStart);
-+    }
++  if (!Loops.count(LoopStart)) {
++    wireFlow(ExitUseAllowed, LoopEnd);
++    return;
 +  }
 +
-+  BasicBlock *Exit = ParentRegion->getExit();
-+  BranchInst::Create(Exit, Prev);
-+  addPhiValues(Prev, Exit);
-+  if (DT->dominates(ParentRegion->getEntry(), Exit))
-+    DT->changeImmediateDominator(Exit, Prev);
-+
-+  if (LoopStart && LoopEnd) {
-+    BBVector::iterator FI = std::find(FlowsInserted.begin(),
-+                                      FlowsInserted.end(),
-+                                      LoopStart);
-+    for (; *FI != LoopEnd; ++FI) {
-+      addPhiValues(*FI, (*FI)->getTerminator()->getSuccessor(0));
-+    }
++  if (!isPredictableTrue(Node))
++    LoopStart = needPrefix(true);
++
++  LoopEnd = Loops[Node->getEntry()];
++  wireFlow(false, LoopEnd);
++  while (!Visited.count(LoopEnd)) {
++    handleLoops(false, LoopEnd);
 +  }
 +
-+  assert(Order.empty());
-+  assert(Visited.empty());
-+  assert(DeletedPhis.empty());
++  // Create an extra loop end node
++  LoopEnd = needPrefix(false);
++  BasicBlock *Next = needPostfix(LoopEnd, ExitUseAllowed);
++  LoopConds.push_back(BranchInst::Create(Next, LoopStart,
++                                         BoolUndef, LoopEnd));
++  addPhiValues(LoopEnd, LoopStart);
++  setPrevNode(Next);
 +}
 +
-+/// \brief Insert the missing branch conditions
-+void AMDGPUStructurizeCFG::insertConditions() {
-+  SSAUpdater PhiInserter;
-+
-+  for (BBVector::iterator FI = FlowsInserted.begin(), FE = FlowsInserted.end();
-+       FI != FE; ++FI) {
-+
-+    BranchInst *Term = cast<BranchInst>((*FI)->getTerminator());
-+    if (Term->isUnconditional())
-+      continue;
++/// After this function control flow looks like it should be, but
++/// branches and PHI nodes only have undefined conditions.
++void AMDGPUStructurizeCFG::createFlow() {
 +
-+    PhiInserter.Initialize(Boolean, "");
-+    PhiInserter.AddAvailableValue(&Func->getEntryBlock(), BoolFalse);
++  BasicBlock *Exit = ParentRegion->getExit();
++  bool EntryDominatesExit = DT->dominates(ParentRegion->getEntry(), Exit);
 +
-+    BasicBlock *Succ = Term->getSuccessor(0);
-+    BBPredicates &Preds = (*FI == LoopEnd) ? LoopPred : Predicates[Succ];
-+    for (BBPredicates::iterator PI = Preds.begin(), PE = Preds.end();
-+         PI != PE; ++PI) {
++  DeletedPhis.clear();
++  AddedPhis.clear();
++  Conditions.clear();
++  LoopConds.clear();
 +
-+      PhiInserter.AddAvailableValue(PI->first, PI->second);
-+    }
++  PrevNode = 0;
++  Visited.clear();
 +
-+    Term->setCondition(PhiInserter.GetValueAtEndOfBlock(*FI));
++  while (!Order.empty()) {
++    handleLoops(EntryDominatesExit, 0);
 +  }
++
++  if (PrevNode)
++    changeExit(PrevNode, Exit, EntryDominatesExit);
++  else
++    assert(EntryDominatesExit);
 +}
 +
 +/// Handle a rare case where the disintegrated nodes instructions
@@ -3265,14 +3626,21 @@ index 0000000..22338b5
 +  orderNodes();
 +  collectInfos();
 +  createFlow();
-+  insertConditions();
++  insertConditions(false);
++  insertConditions(true);
++  setPhiValues();
 +  rebuildSSA();
 +
++  // Cleanup
 +  Order.clear();
 +  Visited.clear();
-+  Predicates.clear();
 +  DeletedPhis.clear();
-+  FlowsInserted.clear();
++  AddedPhis.clear();
++  Predicates.clear();
++  Conditions.clear();
++  Loops.clear();
++  LoopPreds.clear();
++  LoopConds.clear();
 +
 +  return true;
 +}
@@ -3447,10 +3815,10 @@ index 0000000..cab7884
 +#endif // AMDGPUSUBTARGET_H
 diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp b/lib/Target/R600/AMDGPUTargetMachine.cpp
 new file mode 100644
-index 0000000..d09dc2e
+index 0000000..e2f00be
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
-@@ -0,0 +1,142 @@
+@@ -0,0 +1,153 @@
 +//===-- AMDGPUTargetMachine.cpp - TargetMachine for hw codegen targets-----===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -3555,6 +3923,12 @@ index 0000000..d09dc2e
 +bool AMDGPUPassConfig::addInstSelector() {
 +  addPass(createAMDGPUPeepholeOpt(*TM));
 +  addPass(createAMDGPUISelDag(getAMDGPUTargetMachine()));
++
++  const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>();
++  if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) {
++    // This callbacks this pass uses are not implemented yet on SI.
++    addPass(createAMDGPUIndirectAddressingPass(*TM));
++  }
 +  return false;
 +}
 +
@@ -3569,6 +3943,11 @@ index 0000000..d09dc2e
 +}
 +
 +bool AMDGPUPassConfig::addPostRegAlloc() {
++  const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>();
++
++  if (ST.device()->getGeneration() > AMDGPUDeviceInfo::HD6XXX) {
++    addPass(createSIInsertWaits(*TM));
++  }
 +  return false;
 +}
 +
@@ -3585,8 +3964,8 @@ index 0000000..d09dc2e
 +    addPass(createAMDGPUCFGStructurizerPass(*TM));
 +    addPass(createR600ExpandSpecialInstrsPass(*TM));
 +    addPass(&FinalizeMachineBundlesID);
++    addPass(createR600LowerConstCopy(*TM));
 +  } else {
-+    addPass(createSILowerLiteralConstantsPass(*TM));
 +    addPass(createSILowerControlFlowPass(*TM));
 +  }
 +
@@ -3595,7 +3974,7 @@ index 0000000..d09dc2e
 +
 diff --git a/lib/Target/R600/AMDGPUTargetMachine.h b/lib/Target/R600/AMDGPUTargetMachine.h
 new file mode 100644
-index 0000000..399e55c
+index 0000000..5a1dcf4
 --- /dev/null
 +++ b/lib/Target/R600/AMDGPUTargetMachine.h
 @@ -0,0 +1,70 @@
@@ -3616,9 +3995,9 @@ index 0000000..399e55c
 +#ifndef AMDGPU_TARGET_MACHINE_H
 +#define AMDGPU_TARGET_MACHINE_H
 +
++#include "AMDGPUFrameLowering.h"
 +#include "AMDGPUInstrInfo.h"
 +#include "AMDGPUSubtarget.h"
-+#include "AMDILFrameLowering.h"
 +#include "AMDILIntrinsicInfo.h"
 +#include "R600ISelLowering.h"
 +#include "llvm/ADT/OwningPtr.h"
@@ -3671,10 +4050,10 @@ index 0000000..399e55c
 +#endif // AMDGPU_TARGET_MACHINE_H
 diff --git a/lib/Target/R600/AMDIL.h b/lib/Target/R600/AMDIL.h
 new file mode 100644
-index 0000000..4e577dc
+index 0000000..b39fbdb
 --- /dev/null
 +++ b/lib/Target/R600/AMDIL.h
-@@ -0,0 +1,106 @@
+@@ -0,0 +1,122 @@
 +//===-- AMDIL.h - Top-level interface for AMDIL representation --*- C++ -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -3767,14 +4146,30 @@ index 0000000..4e577dc
 +enum AddressSpaces {
 +  PRIVATE_ADDRESS  = 0, ///< Address space for private memory.
 +  GLOBAL_ADDRESS   = 1, ///< Address space for global memory (RAT0, VTX0).
-+  CONSTANT_ADDRESS = 2, ///< Address space for constant memory.
++  CONSTANT_ADDRESS = 2, ///< Address space for constant memory
 +  LOCAL_ADDRESS    = 3, ///< Address space for local memory.
 +  REGION_ADDRESS   = 4, ///< Address space for region memory.
 +  ADDRESS_NONE     = 5, ///< Address space for unknown memory.
 +  PARAM_D_ADDRESS  = 6, ///< Address space for direct addressible parameter memory (CONST0)
 +  PARAM_I_ADDRESS  = 7, ///< Address space for indirect addressible parameter memory (VTX1)
 +  USER_SGPR_ADDRESS = 8, ///< Address space for USER_SGPRS on SI
-+  LAST_ADDRESS     = 9
++  CONSTANT_BUFFER_0 = 9,
++  CONSTANT_BUFFER_1 = 10,
++  CONSTANT_BUFFER_2 = 11,
++  CONSTANT_BUFFER_3 = 12,
++  CONSTANT_BUFFER_4 = 13,
++  CONSTANT_BUFFER_5 = 14,
++  CONSTANT_BUFFER_6 = 15,
++  CONSTANT_BUFFER_7 = 16,
++  CONSTANT_BUFFER_8 = 17,
++  CONSTANT_BUFFER_9 = 18,
++  CONSTANT_BUFFER_10 = 19,
++  CONSTANT_BUFFER_11 = 20,
++  CONSTANT_BUFFER_12 = 21,
++  CONSTANT_BUFFER_13 = 22,
++  CONSTANT_BUFFER_14 = 23,
++  CONSTANT_BUFFER_15 = 24,
++  LAST_ADDRESS     = 25
 +};
 +
 +} // namespace AMDGPUAS
@@ -4073,10 +4468,10 @@ index 0000000..c12cedc
 +
 diff --git a/lib/Target/R600/AMDILCFGStructurizer.cpp b/lib/Target/R600/AMDILCFGStructurizer.cpp
 new file mode 100644
-index 0000000..9de97b6
+index 0000000..568d281
 --- /dev/null
 +++ b/lib/Target/R600/AMDILCFGStructurizer.cpp
-@@ -0,0 +1,3049 @@
+@@ -0,0 +1,3045 @@
 +//===-- AMDILCFGStructurizer.cpp - CFG Structurizer -----------------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -6101,9 +6496,7 @@ index 0000000..9de97b6
 +      CFGTraits::insertAssignInstrBefore(insertPos, passRep, immReg, 1);
 +      InstrT *newInstr = 
 +        CFGTraits::insertInstrBefore(insertPos, AMDGPU::BRANCH_COND_i32, passRep);
-+      MachineInstrBuilder MIB(*funcRep, newInstr);
-+      MIB.addMBB(loopHeader);
-+      MIB.addReg(immReg, false);
++      MachineInstrBuilder(newInstr).addMBB(loopHeader).addReg(immReg, false);
 +
 +      SHOWNEWINSTR(newInstr);
 +
@@ -6925,12 +7318,13 @@ index 0000000..9de97b6
 +    MachineInstr *oldInstr = &(*instrPos);
 +    const TargetInstrInfo *tii = passRep->getTargetInstrInfo();
 +    MachineBasicBlock *blk = oldInstr->getParent();
-+    MachineFunction *MF = blk->getParent();
-+    MachineInstr *newInstr = MF->CreateMachineInstr(tii->get(newOpcode), DL);
++    MachineInstr *newInstr =
++      blk->getParent()->CreateMachineInstr(tii->get(newOpcode),
++                                           DL);
 +
 +    blk->insert(instrPos, newInstr);
-+    MachineInstrBuilder MIB(*MF, newInstr);
-+    MIB.addReg(oldInstr->getOperand(1).getReg(), false);
++    MachineInstrBuilder(newInstr).addReg(oldInstr->getOperand(1).getReg(),
++                                         false);
 +
 +    SHOWNEWINSTR(newInstr);
 +    //erase later oldInstr->eraseFromParent();
@@ -6943,13 +7337,13 @@ index 0000000..9de97b6
 +                                     RegiT regNum,
 +                                     DebugLoc DL) {
 +    const TargetInstrInfo *tii = passRep->getTargetInstrInfo();
-+    MachineFunction *MF = blk->getParent();
 +
-+    MachineInstr *newInstr = MF->CreateMachineInstr(tii->get(newOpcode), DL);
++    MachineInstr *newInstr =
++      blk->getParent()->CreateMachineInstr(tii->get(newOpcode), DL);
 +
 +    //insert before
 +    blk->insert(insertPos, newInstr);
-+    MachineInstrBuilder(*MF, newInstr).addReg(regNum, false);
++    MachineInstrBuilder(newInstr).addReg(regNum, false);
 +
 +    SHOWNEWINSTR(newInstr);
 +  } //insertCondBranchBefore
@@ -6959,12 +7353,11 @@ index 0000000..9de97b6
 +                                  AMDGPUCFGStructurizer *passRep,
 +                                  RegiT regNum) {
 +    const TargetInstrInfo *tii = passRep->getTargetInstrInfo();
-+    MachineFunction *MF = blk->getParent();
 +    MachineInstr *newInstr =
-+      MF->CreateMachineInstr(tii->get(newOpcode), DebugLoc());
++      blk->getParent()->CreateMachineInstr(tii->get(newOpcode), DebugLoc());
 +
 +    blk->push_back(newInstr);
-+    MachineInstrBuilder(*MF, newInstr).addReg(regNum, false);
++    MachineInstrBuilder(newInstr).addReg(regNum, false);
 +
 +    SHOWNEWINSTR(newInstr);
 +  } //insertCondBranchEnd
@@ -7009,14 +7402,12 @@ index 0000000..9de97b6
 +                                       RegiT src2Reg) {
 +    const AMDGPUInstrInfo *tii =
 +             static_cast<const AMDGPUInstrInfo *>(passRep->getTargetInstrInfo());
-+    MachineFunction *MF = blk->getParent();
 +    MachineInstr *newInstr =
-+      MF->CreateMachineInstr(tii->get(tii->getIEQOpcode()), DebugLoc());
++      blk->getParent()->CreateMachineInstr(tii->get(tii->getIEQOpcode()), DebugLoc());
 +
-+    MachineInstrBuilder MIB(*MF, newInstr);
-+    MIB.addReg(dstReg, RegState::Define); //set target
-+    MIB.addReg(src1Reg); //set src value
-+    MIB.addReg(src2Reg); //set src value
++    MachineInstrBuilder(newInstr).addReg(dstReg, RegState::Define); //set target
++    MachineInstrBuilder(newInstr).addReg(src1Reg); //set src value
++    MachineInstrBuilder(newInstr).addReg(src2Reg); //set src value
 +
 +    blk->insert(instrPos, newInstr);
 +    SHOWNEWINSTR(newInstr);
@@ -7872,13 +8263,13 @@ index 0000000..6dc2deb
 +  
 +} // namespace llvm
 +#endif // AMDILEVERGREENDEVICE_H
-diff --git a/lib/Target/R600/AMDILFrameLowering.cpp b/lib/Target/R600/AMDILFrameLowering.cpp
+diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp b/lib/Target/R600/AMDILISelDAGToDAG.cpp
 new file mode 100644
-index 0000000..9ad495a
+index 0000000..2e726e9
 --- /dev/null
-+++ b/lib/Target/R600/AMDILFrameLowering.cpp
-@@ -0,0 +1,47 @@
-+//===----------------------- AMDILFrameLowering.cpp -----------------*- C++ -*-===//
++++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp
+@@ -0,0 +1,577 @@
++//===-- AMDILISelDAGToDAG.cpp - A dag to dag inst selector for AMDIL ------===//
 +//
 +//                     The LLVM Compiler Infrastructure
 +//
@@ -7888,119 +8279,21 @@ index 0000000..9ad495a
 +//==-----------------------------------------------------------------------===//
 +//
 +/// \file
-+/// \brief Interface to describe a layout of a stack frame on a AMDGPU target
-+/// machine.
++/// \brief Defines an instruction selector for the AMDGPU target.
 +//
 +//===----------------------------------------------------------------------===//
-+#include "AMDILFrameLowering.h"
-+#include "llvm/CodeGen/MachineFrameInfo.h"
-+
-+using namespace llvm;
-+AMDGPUFrameLowering::AMDGPUFrameLowering(StackDirection D, unsigned StackAl,
-+    int LAO, unsigned TransAl)
-+  : TargetFrameLowering(D, StackAl, LAO, TransAl) {
-+}
-+
-+AMDGPUFrameLowering::~AMDGPUFrameLowering() {
-+}
-+
-+int AMDGPUFrameLowering::getFrameIndexOffset(const MachineFunction &MF,
-+                                         int FI) const {
-+  const MachineFrameInfo *MFI = MF.getFrameInfo();
-+  return MFI->getObjectOffset(FI);
-+}
-+
-+const TargetFrameLowering::SpillSlot *
-+AMDGPUFrameLowering::getCalleeSavedSpillSlots(unsigned &NumEntries) const {
-+  NumEntries = 0;
-+  return 0;
-+}
-+void
-+AMDGPUFrameLowering::emitPrologue(MachineFunction &MF) const {
-+}
-+void
-+AMDGPUFrameLowering::emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const {
-+}
-+bool
-+AMDGPUFrameLowering::hasFP(const MachineFunction &MF) const {
-+  return false;
-+}
-diff --git a/lib/Target/R600/AMDILFrameLowering.h b/lib/Target/R600/AMDILFrameLowering.h
-new file mode 100644
-index 0000000..51337c3
---- /dev/null
-+++ b/lib/Target/R600/AMDILFrameLowering.h
-@@ -0,0 +1,40 @@
-+//===--------------------- AMDILFrameLowering.h -----------------*- C++ -*-===//
-+//
-+//                     The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//===----------------------------------------------------------------------===//
-+//
-+/// \file
-+/// \brief Interface to describe a layout of a stack frame on a AMDIL target
-+/// machine.
-+//
-+//===----------------------------------------------------------------------===//
-+#ifndef AMDILFRAME_LOWERING_H
-+#define AMDILFRAME_LOWERING_H
-+
-+#include "llvm/CodeGen/MachineFunction.h"
-+#include "llvm/Target/TargetFrameLowering.h"
-+
-+namespace llvm {
-+
-+/// \brief Information about the stack frame layout on the AMDGPU targets.
-+///
-+/// It holds the direction of the stack growth, the known stack alignment on
-+/// entry to each function, and the offset to the locals area.
-+/// See TargetFrameInfo for more comments.
-+class AMDGPUFrameLowering : public TargetFrameLowering {
-+public:
-+  AMDGPUFrameLowering(StackDirection D, unsigned StackAl, int LAO,
-+                      unsigned TransAl = 1);
-+  virtual ~AMDGPUFrameLowering();
-+  virtual int getFrameIndexOffset(const MachineFunction &MF, int FI) const;
-+  virtual const SpillSlot *getCalleeSavedSpillSlots(unsigned &NumEntries) const;
-+  virtual void emitPrologue(MachineFunction &MF) const;
-+  virtual void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const;
-+  virtual bool hasFP(const MachineFunction &MF) const;
-+};
-+} // namespace llvm
-+#endif // AMDILFRAME_LOWERING_H
-diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp b/lib/Target/R600/AMDILISelDAGToDAG.cpp
-new file mode 100644
-index 0000000..d15ed39
---- /dev/null
-+++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp
-@@ -0,0 +1,485 @@
-+//===-- AMDILISelDAGToDAG.cpp - A dag to dag inst selector for AMDIL ------===//
-+//
-+//                     The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//==-----------------------------------------------------------------------===//
-+//
-+/// \file
-+/// \brief Defines an instruction selector for the AMDGPU target.
-+//
-+//===----------------------------------------------------------------------===//
-+#include "AMDGPUInstrInfo.h"
-+#include "AMDGPUISelLowering.h" // For AMDGPUISD
-+#include "AMDGPURegisterInfo.h"
-+#include "AMDILDevices.h"
-+#include "R600InstrInfo.h"
-+#include "llvm/ADT/ValueMap.h"
-+#include "llvm/CodeGen/PseudoSourceValue.h"
-+#include "llvm/CodeGen/SelectionDAGISel.h"
-+#include "llvm/Support/Compiler.h"
-+#include <list>
-+#include <queue>
++#include "AMDGPUInstrInfo.h"
++#include "AMDGPUISelLowering.h" // For AMDGPUISD
++#include "AMDGPURegisterInfo.h"
++#include "AMDILDevices.h"
++#include "R600InstrInfo.h"
++#include "llvm/ADT/ValueMap.h"
++#include "llvm/CodeGen/PseudoSourceValue.h"
++#include "llvm/CodeGen/SelectionDAGISel.h"
++#include "llvm/Support/Compiler.h"
++#include "llvm/CodeGen/SelectionDAG.h"
++#include <list>
++#include <queue>
 +
 +using namespace llvm;
 +
@@ -8024,6 +8317,7 @@ index 0000000..d15ed39
 +
 +private:
 +  inline SDValue getSmallIPtrImm(unsigned Imm);
++  bool FoldOperands(unsigned, const R600InstrInfo *, std::vector<SDValue> &);
 +
 +  // Complex pattern selectors
 +  bool SelectADDRParam(SDValue Addr, SDValue& R1, SDValue& R2);
@@ -8046,9 +8340,11 @@ index 0000000..d15ed39
 +  static bool isLocalLoad(const LoadSDNode *N);
 +  static bool isRegionLoad(const LoadSDNode *N);
 +
-+  bool SelectADDR8BitOffset(SDValue Addr, SDValue& Base, SDValue& Offset);
-+  bool SelectADDRReg(SDValue Addr, SDValue& Base, SDValue& Offset);
++  bool SelectGlobalValueConstantOffset(SDValue Addr, SDValue& IntPtr);
++  bool SelectGlobalValueVariableOffset(SDValue Addr,
++      SDValue &BaseReg, SDValue& Offset);
 +  bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base, SDValue &Offset);
++  bool SelectADDRIndirect(SDValue Addr, SDValue &Base, SDValue &Offset);
 +
 +  // Include the pieces autogenerated from the target description.
 +#include "AMDGPUGenDAGISel.inc"
@@ -8135,16 +8431,6 @@ index 0000000..d15ed39
 +  }
 +  switch (Opc) {
 +  default: break;
-+  case ISD::FrameIndex: {
-+    if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(N)) {
-+      unsigned int FI = FIN->getIndex();
-+      EVT OpVT = N->getValueType(0);
-+      unsigned int NewOpc = AMDGPU::COPY;
-+      SDValue TFI = CurDAG->getTargetFrameIndex(FI, MVT::i32);
-+      return CurDAG->SelectNodeTo(N, NewOpc, OpVT, TFI);
-+    }
-+    break;
-+  }
 +  case ISD::ConstantFP:
 +  case ISD::Constant: {
 +    const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>();
@@ -8203,7 +8489,9 @@ index 0000000..d15ed39
 +            continue;
 +          }
 +      } else {
-+        if (!TII->isALUInstr(Use->getMachineOpcode())) {
++        if (!TII->isALUInstr(Use->getMachineOpcode()) ||
++            (TII->get(Use->getMachineOpcode()).TSFlags &
++            R600_InstFlag::VECTOR)) {
 +          continue;
 +        }
 +
@@ -8238,7 +8526,116 @@ index 0000000..d15ed39
 +    break;
 +  }
 +  }
-+  return SelectCode(N);
++  SDNode *Result = SelectCode(N);
++
++  // Fold operands of selected node
++
++  const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>();
++  if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) {
++    const R600InstrInfo *TII =
++        static_cast<const R600InstrInfo*>(TM.getInstrInfo());
++    if (Result && Result->isMachineOpcode() &&
++        !(TII->get(Result->getMachineOpcode()).TSFlags & R600_InstFlag::VECTOR)
++        && TII->isALUInstr(Result->getMachineOpcode())) {
++      // Fold FNEG/FABS/CONST_ADDRESS
++      // TODO: Isel can generate multiple MachineInst, we need to recursively
++      // parse Result
++      bool IsModified = false;
++      do {
++        std::vector<SDValue> Ops;
++        for(SDNode::op_iterator I = Result->op_begin(), E = Result->op_end();
++            I != E; ++I)
++          Ops.push_back(*I);
++        IsModified = FoldOperands(Result->getMachineOpcode(), TII, Ops);
++        if (IsModified) {
++          Result = CurDAG->UpdateNodeOperands(Result, Ops.data(), Ops.size());
++        }
++      } while (IsModified);
++
++      // If node has a single use which is CLAMP_R600, folds it
++      if (Result->hasOneUse() && Result->isMachineOpcode()) {
++        SDNode *PotentialClamp = *Result->use_begin();
++        if (PotentialClamp->isMachineOpcode() &&
++            PotentialClamp->getMachineOpcode() == AMDGPU::CLAMP_R600) {
++          unsigned ClampIdx =
++            TII->getOperandIdx(Result->getMachineOpcode(), R600Operands::CLAMP);
++          std::vector<SDValue> Ops;
++          unsigned NumOp = Result->getNumOperands();
++          for (unsigned i = 0; i < NumOp; ++i) {
++            Ops.push_back(Result->getOperand(i));
++          }
++          Ops[ClampIdx - 1] = CurDAG->getTargetConstant(1, MVT::i32);
++          Result = CurDAG->SelectNodeTo(PotentialClamp,
++              Result->getMachineOpcode(), PotentialClamp->getVTList(),
++              Ops.data(), NumOp);
++        }
++      }
++    }
++  }
++
++  return Result;
++}
++
++bool AMDGPUDAGToDAGISel::FoldOperands(unsigned Opcode,
++    const R600InstrInfo *TII, std::vector<SDValue> &Ops) {
++  int OperandIdx[] = {
++    TII->getOperandIdx(Opcode, R600Operands::SRC0),
++    TII->getOperandIdx(Opcode, R600Operands::SRC1),
++    TII->getOperandIdx(Opcode, R600Operands::SRC2)
++  };
++  int SelIdx[] = {
++    TII->getOperandIdx(Opcode, R600Operands::SRC0_SEL),
++    TII->getOperandIdx(Opcode, R600Operands::SRC1_SEL),
++    TII->getOperandIdx(Opcode, R600Operands::SRC2_SEL)
++  };
++  int NegIdx[] = {
++    TII->getOperandIdx(Opcode, R600Operands::SRC0_NEG),
++    TII->getOperandIdx(Opcode, R600Operands::SRC1_NEG),
++    TII->getOperandIdx(Opcode, R600Operands::SRC2_NEG)
++  };
++  int AbsIdx[] = {
++    TII->getOperandIdx(Opcode, R600Operands::SRC0_ABS),
++    TII->getOperandIdx(Opcode, R600Operands::SRC1_ABS),
++    -1
++  };
++
++  for (unsigned i = 0; i < 3; i++) {
++    if (OperandIdx[i] < 0)
++      return false;
++    SDValue Operand = Ops[OperandIdx[i] - 1];
++    switch (Operand.getOpcode()) {
++    case AMDGPUISD::CONST_ADDRESS: {
++      if (i == 2)
++        break;
++      SDValue CstOffset;
++      if (!Operand.getValueType().isVector() &&
++          SelectGlobalValueConstantOffset(Operand.getOperand(0), CstOffset)) {
++        Ops[OperandIdx[i] - 1] = CurDAG->getRegister(AMDGPU::ALU_CONST, MVT::f32);
++        Ops[SelIdx[i] - 1] = CstOffset;
++        return true;
++      }
++      }
++      break;
++    case ISD::FNEG:
++      if (NegIdx[i] < 0)
++        break;
++      Ops[OperandIdx[i] - 1] = Operand.getOperand(0);
++      Ops[NegIdx[i] - 1] = CurDAG->getTargetConstant(1, MVT::i32);
++      return true;
++    case ISD::FABS:
++      if (AbsIdx[i] < 0)
++        break;
++      Ops[OperandIdx[i] - 1] = Operand.getOperand(0);
++      Ops[AbsIdx[i] - 1] = CurDAG->getTargetConstant(1, MVT::i32);
++      return true;
++    case ISD::BITCAST:
++      Ops[OperandIdx[i] - 1] = Operand.getOperand(0);
++      return true;
++    default:
++      break;
++    }
++  }
++  return false;
 +}
 +
 +bool AMDGPUDAGToDAGISel::checkType(const Value *ptr, unsigned int addrspace) {
@@ -8385,41 +8782,23 @@ index 0000000..d15ed39
 +
 +///==== AMDGPU Functions ====///
 +
-+bool AMDGPUDAGToDAGISel::SelectADDR8BitOffset(SDValue Addr, SDValue& Base,
-+                                             SDValue& Offset) {
-+  if (Addr.getOpcode() == ISD::TargetExternalSymbol ||
-+      Addr.getOpcode() == ISD::TargetGlobalAddress) {
-+    return false;
++bool AMDGPUDAGToDAGISel::SelectGlobalValueConstantOffset(SDValue Addr,
++    SDValue& IntPtr) {
++  if (ConstantSDNode *Cst = dyn_cast<ConstantSDNode>(Addr)) {
++    IntPtr = CurDAG->getIntPtrConstant(Cst->getZExtValue() / 4, true);
++    return true;
 +  }
++  return false;
++}
 +
-+
-+  if (Addr.getOpcode() == ISD::ADD) {
-+    bool Match = false;
-+
-+    // Find the base ptr and the offset
-+    for (unsigned i = 0; i < Addr.getNumOperands(); i++) {
-+      SDValue Arg = Addr.getOperand(i);
-+      ConstantSDNode * OffsetNode = dyn_cast<ConstantSDNode>(Arg);
-+      // This arg isn't a constant so it must be the base PTR.
-+      if (!OffsetNode) {
-+        Base = Addr.getOperand(i);
-+        continue;
-+      }
-+      // Check if the constant argument fits in 8-bits.  The offset is in bytes
-+      // so we need to convert it to dwords.
-+      if (isUInt<8>(OffsetNode->getZExtValue() >> 2)) {
-+        Match = true;
-+        Offset = CurDAG->getTargetConstant(OffsetNode->getZExtValue() >> 2,
-+                                           MVT::i32);
-+      }
-+    }
-+    return Match;
++bool AMDGPUDAGToDAGISel::SelectGlobalValueVariableOffset(SDValue Addr,
++    SDValue& BaseReg, SDValue &Offset) {
++  if (!dyn_cast<ConstantSDNode>(Addr)) {
++    BaseReg = Addr;
++    Offset = CurDAG->getIntPtrConstant(0, true);
++    return true;
 +  }
-+
-+  // Default case, no offset
-+  Base = Addr;
-+  Offset = CurDAG->getTargetConstant(0, MVT::i32);
-+  return true;
++  return false;
 +}
 +
 +bool AMDGPUDAGToDAGISel::SelectADDRVTX_READ(SDValue Addr, SDValue &Base,
@@ -8449,16 +8828,21 @@ index 0000000..d15ed39
 +  return true;
 +}
 +
-+bool AMDGPUDAGToDAGISel::SelectADDRReg(SDValue Addr, SDValue& Base,
-+                                      SDValue& Offset) {
-+  if (Addr.getOpcode() == ISD::TargetExternalSymbol ||
-+      Addr.getOpcode() == ISD::TargetGlobalAddress  ||
-+      Addr.getOpcode() != ISD::ADD) {
-+    return false;
-+  }
++bool AMDGPUDAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,
++                                            SDValue &Offset) {
++  ConstantSDNode *C;
 +
-+  Base = Addr.getOperand(0);
-+  Offset = Addr.getOperand(1);
++  if ((C = dyn_cast<ConstantSDNode>(Addr))) {
++    Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32);
++    Offset = CurDAG->getTargetConstant(C->getZExtValue(), MVT::i32);
++  } else if ((Addr.getOpcode() == ISD::ADD || Addr.getOpcode() == ISD::OR) &&
++            (C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {
++    Base = Addr.getOperand(0);
++    Offset = CurDAG->getTargetConstant(C->getZExtValue(), MVT::i32);
++  } else {
++    Base = Addr;
++    Offset = CurDAG->getTargetConstant(0, MVT::i32);
++  }
 +
 +  return true;
 +}
@@ -9857,10 +10241,10 @@ index 0000000..bc7df37
 +#endif // AMDILNIDEVICE_H
 diff --git a/lib/Target/R600/AMDILPeepholeOptimizer.cpp b/lib/Target/R600/AMDILPeepholeOptimizer.cpp
 new file mode 100644
-index 0000000..4a748b8
+index 0000000..57317ac
 --- /dev/null
 +++ b/lib/Target/R600/AMDILPeepholeOptimizer.cpp
-@@ -0,0 +1,1215 @@
+@@ -0,0 +1,1256 @@
 +//===-- AMDILPeepholeOptimizer.cpp - AMDGPU Peephole optimizations ---------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -10409,14 +10793,51 @@ index 0000000..4a748b8
 +  lhsMaskOffset = lhsMaskVal ? CountTrailingZeros_32(lhsMaskVal) : lhsShiftVal;
 +  rhsMaskOffset = rhsMaskVal ? CountTrailingZeros_32(rhsMaskVal) : rhsShiftVal;
 +  // TODO: Handle the case of A & B | D & ~B(i.e. inverted masks).
++  if (mDebug) {
++      dbgs() << "Found pattern: \'((A" << (LHSMask ? " & B)" : ")");
++      dbgs() << (LHSShift ? " << C)" : ")") << " | ((D" ;
++      dbgs() << (RHSMask ? " & E)" : ")");
++      dbgs() << (RHSShift ? " << F)\'\n" : ")\'\n");
++      dbgs() << "A = LHSSrc\t\tD = RHSSrc \n";
++      dbgs() << "B = " << lhsMaskVal << "\t\tE = " << rhsMaskVal << "\n";
++      dbgs() << "C = " << lhsShiftVal << "\t\tF = " << rhsShiftVal << "\n";
++      dbgs() << "width(B) = " << lhsMaskWidth;
++      dbgs() << "\twidth(E) = " << rhsMaskWidth << "\n";
++      dbgs() << "offset(B) = " << lhsMaskOffset;
++      dbgs() << "\toffset(E) = " << rhsMaskOffset << "\n";
++      dbgs() << "Constraints: \n";
++      dbgs() << "\t(1) B ^ E == 0\n";
++      dbgs() << "\t(2-LHS) B is a mask\n";
++      dbgs() << "\t(2-LHS) E is a mask\n";
++      dbgs() << "\t(3-LHS) (offset(B)) >= (width(E) + offset(E))\n";
++      dbgs() << "\t(3-RHS) (offset(E)) >= (width(B) + offset(B))\n";
++  }
 +  if ((lhsMaskVal || rhsMaskVal) && !(lhsMaskVal ^ rhsMaskVal)) {
++    if (mDebug) {
++      dbgs() << lhsMaskVal << " ^ " << rhsMaskVal;
++      dbgs() << " = " << (lhsMaskVal ^ rhsMaskVal) << "\n";
++      dbgs() << "Failed constraint 1!\n";
++    }
 +    return false;
 +  }
++  if (mDebug) {
++    dbgs() << "LHS = " << lhsMaskOffset << "";
++    dbgs() << " >= (" << rhsMaskWidth << " + " << rhsMaskOffset << ") = ";
++    dbgs() << (lhsMaskOffset >= (rhsMaskWidth + rhsMaskOffset));
++    dbgs() << "\nRHS = " << rhsMaskOffset << "";
++    dbgs() << " >= (" << lhsMaskWidth << " + " << lhsMaskOffset << ") = ";
++    dbgs() << (rhsMaskOffset >= (lhsMaskWidth + lhsMaskOffset));
++    dbgs() << "\n";
++  }
 +  if (lhsMaskOffset >= (rhsMaskWidth + rhsMaskOffset)) {
 +    offset = ConstantInt::get(aType, lhsMaskOffset, false);
 +    width = ConstantInt::get(aType, lhsMaskWidth, false);
 +    RHSSrc = RHS;
 +    if (!isMask_32(lhsMaskVal) && !isShiftedMask_32(lhsMaskVal)) {
++      if (mDebug) {
++        dbgs() << "Value is not a Mask: " << lhsMaskVal << "\n";
++        dbgs() << "Failed constraint 2!\n";
++      }
 +      return false;
 +    }
 +    if (!LHSShift) {
@@ -10435,6 +10856,10 @@ index 0000000..4a748b8
 +    LHSSrc = RHSSrc;
 +    RHSSrc = LHS;
 +    if (!isMask_32(rhsMaskVal) && !isShiftedMask_32(rhsMaskVal)) {
++      if (mDebug) {
++        dbgs() << "Non-Mask: " << rhsMaskVal << "\n";
++        dbgs() << "Failed constraint 2!\n";
++      }
 +      return false;
 +    }
 +    if (!RHSShift) {
@@ -11287,10 +11712,10 @@ index 0000000..5b2cb25
 +#endif // AMDILSIDEVICE_H
 diff --git a/lib/Target/R600/CMakeLists.txt b/lib/Target/R600/CMakeLists.txt
 new file mode 100644
-index 0000000..ce0b56b
+index 0000000..8ef9f8c
 --- /dev/null
 +++ b/lib/Target/R600/CMakeLists.txt
-@@ -0,0 +1,55 @@
+@@ -0,0 +1,56 @@
 +set(LLVM_TARGET_DEFINITIONS AMDGPU.td)
 +
 +tablegen(LLVM AMDGPUGenRegisterInfo.inc -gen-register-info)
@@ -11304,7 +11729,7 @@ index 0000000..ce0b56b
 +tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)
 +add_public_tablegen_target(AMDGPUCommonTableGen)
 +
-+add_llvm_target(R600CodeGen
++add_llvm_target(AMDGPUCodeGen
 +  AMDIL7XXDevice.cpp
 +  AMDILCFGStructurizer.cpp
 +  AMDILDevice.cpp
@@ -11318,9 +11743,9 @@ index 0000000..ce0b56b
 +  AMDILPeepholeOptimizer.cpp
 +  AMDILSIDevice.cpp
 +  AMDGPUAsmPrinter.cpp
++  AMDGPUIndirectAddressing.cpp
 +  AMDGPUMCInstLower.cpp
 +  AMDGPUSubtarget.cpp
-+  AMDGPUStructurizeCFG.cpp
 +  AMDGPUTargetMachine.cpp
 +  AMDGPUISelLowering.cpp
 +  AMDGPUConvertToISA.cpp
@@ -11329,9 +11754,9 @@ index 0000000..ce0b56b
 +  R600ExpandSpecialInstrs.cpp
 +  R600InstrInfo.cpp
 +  R600ISelLowering.cpp
++  R600LowerConstCopy.cpp
 +  R600MachineFunctionInfo.cpp
 +  R600RegisterInfo.cpp
-+  SIAnnotateControlFlow.cpp
 +  SIAssignInterpRegs.cpp
 +  SIInstrInfo.cpp
 +  SIISelLowering.cpp
@@ -11339,6 +11764,7 @@ index 0000000..ce0b56b
 +  SILowerControlFlow.cpp
 +  SIMachineFunctionInfo.cpp
 +  SIRegisterInfo.cpp
++  SIFixSGPRLiveness.cpp
 +  )
 +
 +add_dependencies(LLVMR600CodeGen intrinsics_gen)
@@ -11348,10 +11774,10 @@ index 0000000..ce0b56b
 +add_subdirectory(MCTargetDesc)
 diff --git a/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp
 new file mode 100644
-index 0000000..e6c550b
+index 0000000..d6450a0
 --- /dev/null
 +++ b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.cpp
-@@ -0,0 +1,132 @@
+@@ -0,0 +1,168 @@
 +//===-- AMDGPUInstPrinter.cpp - AMDGPU MC Inst -> ASM ---------------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -11394,6 +11820,21 @@ index 0000000..e6c550b
 +  }
 +}
 +
++void AMDGPUInstPrinter::printInterpSlot(const MCInst *MI, unsigned OpNum,
++                                        raw_ostream &O) {
++  unsigned Imm = MI->getOperand(OpNum).getImm();
++
++  if (Imm == 2) {
++    O << "P0";
++  } else if (Imm == 1) {
++    O << "P20";
++  } else if (Imm == 0) {
++    O << "P10";
++  } else {
++    assert(!"Invalid interpolation parameter slot");
++  }
++}
++
 +void AMDGPUInstPrinter::printMemOperand(const MCInst *MI, unsigned OpNo,
 +                                        raw_ostream &O) {
 +  printOperand(MI, OpNo, O);
@@ -11459,10 +11900,7 @@ index 0000000..e6c550b
 +
 +void AMDGPUInstPrinter::printRel(const MCInst *MI, unsigned OpNo,
 +                                 raw_ostream &O) {
-+  const MCOperand &Op = MI->getOperand(OpNo);
-+  if (Op.getImm() != 0) {
-+    O << " + " << Op.getImm();
-+  }
++  printIfSet(MI, OpNo, O, "+");
 +}
 +
 +void AMDGPUInstPrinter::printUpdateExecMask(const MCInst *MI, unsigned OpNo,
@@ -11483,13 +11921,37 @@ index 0000000..e6c550b
 +  }
 +}
 +
++void AMDGPUInstPrinter::printSel(const MCInst *MI, unsigned OpNo,
++                                  raw_ostream &O) {
++  const char * chans = "XYZW";
++  int sel = MI->getOperand(OpNo).getImm();
++
++  int chan = sel & 3;
++  sel >>= 2;
++
++  if (sel >= 512) {
++    sel -= 512;
++    int cb = sel >> 12;
++    sel &= 4095;
++    O << cb << "[" << sel << "]";
++  } else if (sel >= 448) {
++    sel -= 448;
++    O << sel;
++  } else if (sel >= 0){
++    O << sel;
++  }
++
++  if (sel >= 0)
++    O << "." << chans[chan];
++}
++
 +#include "AMDGPUGenAsmWriter.inc"
 diff --git a/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h
 new file mode 100644
-index 0000000..96e0e46
+index 0000000..767a708
 --- /dev/null
 +++ b/lib/Target/R600/InstPrinter/AMDGPUInstPrinter.h
-@@ -0,0 +1,52 @@
+@@ -0,0 +1,54 @@
 +//===-- AMDGPUInstPrinter.h - AMDGPU MC Inst -> ASM interface ---*- C++ -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -11525,6 +11987,7 @@ index 0000000..96e0e46
 +
 +private:
 +  void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
++  void printInterpSlot(const MCInst *MI, unsigned OpNum, raw_ostream &O);
 +  void printMemOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
 +  void printIfSet(const MCInst *MI, unsigned OpNo, raw_ostream &O, StringRef Asm);
 +  void printAbs(const MCInst *MI, unsigned OpNo, raw_ostream &O);
@@ -11537,6 +12000,7 @@ index 0000000..96e0e46
 +  void printUpdateExecMask(const MCInst *MI, unsigned OpNo, raw_ostream &O);
 +  void printUpdatePred(const MCInst *MI, unsigned OpNo, raw_ostream &O);
 +  void printWrite(const MCInst *MI, unsigned OpNo, raw_ostream &O);
++  void printSel(const MCInst *MI, unsigned OpNo, raw_ostream &O);
 +};
 +
 +} // End namespace llvm
@@ -11544,7 +12008,7 @@ index 0000000..96e0e46
 +#endif // AMDGPUINSTRPRINTER_H
 diff --git a/lib/Target/R600/InstPrinter/CMakeLists.txt b/lib/Target/R600/InstPrinter/CMakeLists.txt
 new file mode 100644
-index 0000000..069c55b
+index 0000000..6776337
 --- /dev/null
 +++ b/lib/Target/R600/InstPrinter/CMakeLists.txt
 @@ -0,0 +1,7 @@
@@ -11554,7 +12018,7 @@ index 0000000..069c55b
 +  AMDGPUInstPrinter.cpp
 +  )
 +
-+add_dependencies(LLVMR600AsmPrinter AMDGPUCommonTableGen)
++add_dependencies(LLVMR600AsmPrinter R600CommonTableGen)
 diff --git a/lib/Target/R600/InstPrinter/LLVMBuild.txt b/lib/Target/R600/InstPrinter/LLVMBuild.txt
 new file mode 100644
 index 0000000..ec0be89
@@ -11869,10 +12333,10 @@ index 0000000..3ad0fa6
 +#endif // AMDGPUMCASMINFO_H
 diff --git a/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h b/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h
 new file mode 100644
-index 0000000..9d0d6cf
+index 0000000..8721f80
 --- /dev/null
 +++ b/lib/Target/R600/MCTargetDesc/AMDGPUMCCodeEmitter.h
-@@ -0,0 +1,60 @@
+@@ -0,0 +1,49 @@
 +//===-- AMDGPUCodeEmitter.h - AMDGPU Code Emitter interface -----------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -11917,17 +12381,6 @@ index 0000000..9d0d6cf
 +                                   SmallVectorImpl<MCFixup> &Fixups) const {
 +    return 0;
 +  }
-+  virtual uint64_t VOPPostEncode(const MCInst &MI, uint64_t Value) const {
-+    return Value;
-+  }
-+  virtual uint64_t i32LiteralEncode(const MCInst &MI, unsigned OpNo,
-+                                   SmallVectorImpl<MCFixup> &Fixups) const {
-+    return 0;
-+  }
-+  virtual uint32_t SMRDmemriEncode(const MCInst &MI, unsigned OpNo,
-+                                   SmallVectorImpl<MCFixup> &Fixups) const {
-+    return 0;
-+  }
 +};
 +
 +} // End namespace llvm
@@ -12182,10 +12635,10 @@ index 0000000..8894a76
 +include $(LEVEL)/Makefile.common
 diff --git a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
 new file mode 100644
-index 0000000..dc91924
+index 0000000..115fe8d
 --- /dev/null
 +++ b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
-@@ -0,0 +1,575 @@
+@@ -0,0 +1,582 @@
 +//===- R600MCCodeEmitter.cpp - Code Emitter for R600->Cayman GPU families -===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -12252,8 +12705,8 @@ index 0000000..dc91924
 +  void EmitALUInstr(const MCInst &MI, SmallVectorImpl<MCFixup> &Fixups,
 +                    raw_ostream &OS) const;
 +  void EmitSrc(const MCInst &MI, unsigned OpIdx, raw_ostream &OS) const;
-+  void EmitSrcISA(const MCInst &MI, unsigned OpIdx, uint64_t &Value,
-+                  raw_ostream &OS) const;
++  void EmitSrcISA(const MCInst &MI, unsigned RegOpIdx, unsigned SelOpIdx,
++                    raw_ostream &OS) const;
 +  void EmitDst(const MCInst &MI, raw_ostream &OS) const;
 +  void EmitTexInstr(const MCInst &MI, SmallVectorImpl<MCFixup> &Fixups,
 +                    raw_ostream &OS) const;
@@ -12350,9 +12803,12 @@ index 0000000..dc91924
 +    case AMDGPU::VTX_READ_PARAM_8_eg:
 +    case AMDGPU::VTX_READ_PARAM_16_eg:
 +    case AMDGPU::VTX_READ_PARAM_32_eg:
++    case AMDGPU::VTX_READ_PARAM_128_eg:
 +    case AMDGPU::VTX_READ_GLOBAL_8_eg:
 +    case AMDGPU::VTX_READ_GLOBAL_32_eg:
-+    case AMDGPU::VTX_READ_GLOBAL_128_eg: {
++    case AMDGPU::VTX_READ_GLOBAL_128_eg:
++    case AMDGPU::TEX_VTX_CONSTBUF:
++    case AMDGPU::TEX_VTX_TEXBUF : {
 +      uint64_t InstWord01 = getBinaryCodeForInstr(MI, Fixups);
 +      uint32_t InstWord2 = MI.getOperand(2).getImm(); // Offset
 +
@@ -12382,7 +12838,6 @@ index 0000000..dc91924
 +                                     SmallVectorImpl<MCFixup> &Fixups,
 +                                     raw_ostream &OS) const {
 +  const MCInstrDesc &MCDesc = MCII.get(MI.getOpcode());
-+  unsigned NumOperands = MI.getNumOperands();
 +
 +  // Emit instruction type
 +  EmitByte(INSTR_ALU, OS);
@@ -12398,19 +12853,21 @@ index 0000000..dc91924
 +    InstWord01 |= ISAOpCode << 1;
 +  }
 +
-+  unsigned SrcIdx = 0;
-+  for (unsigned int OpIdx = 1; OpIdx < NumOperands; ++OpIdx) {
-+    if (MI.getOperand(OpIdx).isImm() || MI.getOperand(OpIdx).isFPImm() ||
-+        OpIdx == (unsigned)MCDesc.findFirstPredOperandIdx()) {
-+      continue;
-+    }
-+    EmitSrcISA(MI, OpIdx, InstWord01, OS);
-+    SrcIdx++;
-+  }
++  unsigned SrcNum = MCDesc.TSFlags & R600_InstFlag::OP3 ? 3 :
++      MCDesc.TSFlags & R600_InstFlag::OP2 ? 2 : 1;
++
++  EmitByte(SrcNum, OS);
++
++  const unsigned SrcOps[3][2] = {
++      {R600Operands::SRC0, R600Operands::SRC0_SEL},
++      {R600Operands::SRC1, R600Operands::SRC1_SEL},
++      {R600Operands::SRC2, R600Operands::SRC2_SEL}
++  };
 +
-+  // Emit zeros for unused sources
-+  for ( ; SrcIdx < 3; SrcIdx++) {
-+    EmitNullBytes(SRC_BYTE_COUNT - 6, OS);
++  for (unsigned SrcIdx = 0; SrcIdx < SrcNum; ++SrcIdx) {
++    unsigned RegOpIdx = R600Operands::ALUOpTable[SrcNum-1][SrcOps[SrcIdx][0]];
++    unsigned SelOpIdx = R600Operands::ALUOpTable[SrcNum-1][SrcOps[SrcIdx][1]];
++    EmitSrcISA(MI, RegOpIdx, SelOpIdx, OS);
 +  }
 +
 +  Emit(InstWord01, OS);
@@ -12481,34 +12938,37 @@ index 0000000..dc91924
 +
 +}
 +
-+void R600MCCodeEmitter::EmitSrcISA(const MCInst &MI, unsigned OpIdx,
-+                                   uint64_t &Value, raw_ostream &OS) const {
-+  const MCOperand &MO = MI.getOperand(OpIdx);
++void R600MCCodeEmitter::EmitSrcISA(const MCInst &MI, unsigned RegOpIdx,
++                                   unsigned SelOpIdx, raw_ostream &OS) const {
++  const MCOperand &RegMO = MI.getOperand(RegOpIdx);
++  const MCOperand &SelMO = MI.getOperand(SelOpIdx);
++
 +  union {
 +    float f;
 +    uint32_t i;
 +  } InlineConstant;
 +  InlineConstant.i = 0;
-+  // Emit the source select (2 bytes).  For GPRs, this is the register index.
-+  // For other potential instruction operands, (e.g. constant registers) the
-+  // value of the source select is defined in the r600isa docs.
-+  if (MO.isReg()) {
-+    unsigned Reg = MO.getReg();
-+    if (AMDGPUMCRegisterClasses[AMDGPU::R600_CReg32RegClassID].contains(Reg)) {
-+      EmitByte(1, OS);
-+    } else {
-+      EmitByte(0, OS);
-+    }
++  // Emit source type (1 byte) and source select (4 bytes). For GPRs type is 0
++  // and select is 0 (GPR index is encoded in the instr encoding. For constants
++  // type is 1 and select is the original const select passed from the driver.
++  unsigned Reg = RegMO.getReg();
++  if (Reg == AMDGPU::ALU_CONST) {
++    EmitByte(1, OS);
++    uint32_t Sel = SelMO.getImm();
++    Emit(Sel, OS);
++  } else {
++    EmitByte(0, OS);
++    Emit((uint32_t)0, OS);
++  }
 +
-+    if (Reg == AMDGPU::ALU_LITERAL_X) {
-+      unsigned ImmOpIndex = MI.getNumOperands() - 1;
-+      MCOperand ImmOp = MI.getOperand(ImmOpIndex);
-+      if (ImmOp.isFPImm()) {
-+        InlineConstant.f = ImmOp.getFPImm();
-+      } else {
-+        assert(ImmOp.isImm());
-+        InlineConstant.i = ImmOp.getImm();
-+      }
++  if (Reg == AMDGPU::ALU_LITERAL_X) {
++    unsigned ImmOpIndex = MI.getNumOperands() - 1;
++    MCOperand ImmOp = MI.getOperand(ImmOpIndex);
++    if (ImmOp.isFPImm()) {
++      InlineConstant.f = ImmOp.getFPImm();
++    } else {
++      assert(ImmOp.isImm());
++      InlineConstant.i = ImmOp.getImm();
 +    }
 +  }
 +
@@ -12763,10 +13223,10 @@ index 0000000..dc91924
 +#include "AMDGPUGenMCCodeEmitter.inc"
 diff --git a/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp b/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp
 new file mode 100644
-index 0000000..c47dc99
+index 0000000..6dfbbe8
 --- /dev/null
 +++ b/lib/Target/R600/MCTargetDesc/SIMCCodeEmitter.cpp
-@@ -0,0 +1,298 @@
+@@ -0,0 +1,235 @@
 +//===-- SIMCCodeEmitter.cpp - SI Code Emitter -------------------------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -12793,38 +13253,16 @@ index 0000000..c47dc99
 +#include "llvm/MC/MCFixup.h"
 +#include "llvm/Support/raw_ostream.h"
 +
-+#define VGPR_BIT(src_idx) (1ULL << (9 * src_idx - 1))
-+#define SI_INSTR_FLAGS_ENCODING_MASK 0xf
-+
-+// These must be kept in sync with SIInstructions.td and also the
-+// InstrEncodingInfo array in SIInstrInfo.cpp.
-+//
-+// NOTE: This enum is only used to identify the encoding type within LLVM,
-+// the actual encoding type that is part of the instruction format is different
-+namespace SIInstrEncodingType {
-+  enum Encoding {
-+    EXP = 0,
-+    LDS = 1,
-+    MIMG = 2,
-+    MTBUF = 3,
-+    MUBUF = 4,
-+    SMRD = 5,
-+    SOP1 = 6,
-+    SOP2 = 7,
-+    SOPC = 8,
-+    SOPK = 9,
-+    SOPP = 10,
-+    VINTRP = 11,
-+    VOP1 = 12,
-+    VOP2 = 13,
-+    VOP3 = 14,
-+    VOPC = 15
-+  };
-+}
-+
 +using namespace llvm;
 +
 +namespace {
++
++/// \brief Helper type used in encoding
++typedef union {
++  int32_t I;
++  float F;
++} IntFloatUnion;
++
 +class SIMCCodeEmitter : public  AMDGPUMCCodeEmitter {
 +  SIMCCodeEmitter(const SIMCCodeEmitter &); // DO NOT IMPLEMENT
 +  void operator=(const SIMCCodeEmitter &); // DO NOT IMPLEMENT
@@ -12833,6 +13271,15 @@ index 0000000..c47dc99
 +  const MCSubtargetInfo &STI;
 +  MCContext &Ctx;
 +
++  /// \brief Encode a sequence of registers with the correct alignment.
++  unsigned GPRAlign(const MCInst &MI, unsigned OpNo, unsigned shift) const;
++
++  /// \brief Can this operand also contain immediate values?
++  bool isSrcOperand(const MCInstrDesc &Desc, unsigned OpNo) const;
++
++  /// \brief Encode an fp or int literal
++  uint32_t getLitEncoding(const MCOperand &MO) const;
++
 +public:
 +  SIMCCodeEmitter(const MCInstrInfo &mcii, const MCRegisterInfo &mri,
 +                  const MCSubtargetInfo &sti, MCContext &ctx)
@@ -12848,11 +13295,6 @@ index 0000000..c47dc99
 +  virtual uint64_t getMachineOpValue(const MCInst &MI, const MCOperand &MO,
 +                                     SmallVectorImpl<MCFixup> &Fixups) const;
 +
-+public:
-+
-+  /// \brief Encode a sequence of registers with the correct alignment.
-+  unsigned GPRAlign(const MCInst &MI, unsigned OpNo, unsigned shift) const;
-+
 +  /// \brief Encoding for when 2 consecutive registers are used
 +  virtual unsigned GPR2AlignEncode(const MCInst &MI, unsigned OpNo,
 +                                   SmallVectorImpl<MCFixup> &Fixup) const;
@@ -12860,73 +13302,142 @@ index 0000000..c47dc99
 +  /// \brief Encoding for when 4 consectuive registers are used
 +  virtual unsigned GPR4AlignEncode(const MCInst &MI, unsigned OpNo,
 +                                   SmallVectorImpl<MCFixup> &Fixup) const;
++};
 +
-+  /// \brief Encoding for SMRD indexed loads
-+  virtual uint32_t SMRDmemriEncode(const MCInst &MI, unsigned OpNo,
-+                                   SmallVectorImpl<MCFixup> &Fixup) const;
++} // End anonymous namespace
++
++MCCodeEmitter *llvm::createSIMCCodeEmitter(const MCInstrInfo &MCII,
++                                           const MCRegisterInfo &MRI,
++                                           const MCSubtargetInfo &STI,
++                                           MCContext &Ctx) {
++  return new SIMCCodeEmitter(MCII, MRI, STI, Ctx);
++}
 +
-+  /// \brief Post-Encoder method for VOP instructions
-+  virtual uint64_t VOPPostEncode(const MCInst &MI, uint64_t Value) const;
++bool SIMCCodeEmitter::isSrcOperand(const MCInstrDesc &Desc,
++                                   unsigned OpNo) const {
 +
-+private:
++  unsigned RegClass = Desc.OpInfo[OpNo].RegClass;
++  return (AMDGPU::SSrc_32RegClassID == RegClass) ||
++         (AMDGPU::SSrc_64RegClassID == RegClass) ||
++         (AMDGPU::VSrc_32RegClassID == RegClass) ||
++         (AMDGPU::VSrc_64RegClassID == RegClass);
++}
 +
-+  /// \returns this SIInstrEncodingType for this instruction.
-+  unsigned getEncodingType(const MCInst &MI) const;
++uint32_t SIMCCodeEmitter::getLitEncoding(const MCOperand &MO) const {
 +
-+  /// \brief Get then size in bytes of this instructions encoding.
-+  unsigned getEncodingBytes(const MCInst &MI) const;
++  IntFloatUnion Imm;
++  if (MO.isImm())
++    Imm.I = MO.getImm();
++  else if (MO.isFPImm())
++    Imm.F = MO.getFPImm();
++  else
++    return ~0;
 +
-+  /// \returns the hardware encoding for a register
-+  unsigned getRegBinaryCode(unsigned reg) const;
++  if (Imm.I >= 0 && Imm.I <= 64)
++    return 128 + Imm.I;
 +
-+  /// \brief Generated function that returns the hardware encoding for
-+  /// a register
-+  unsigned getHWRegNum(unsigned reg) const;
++  if (Imm.I >= -16 && Imm.I <= -1)
++    return 192 + abs(Imm.I);
 +
-+};
++  if (Imm.F == 0.5f)
++    return 240;
 +
-+} // End anonymous namespace
++  if (Imm.F == -0.5f)
++    return 241;
 +
-+MCCodeEmitter *llvm::createSIMCCodeEmitter(const MCInstrInfo &MCII,
-+                                           const MCRegisterInfo &MRI,
-+                                           const MCSubtargetInfo &STI,
-+                                           MCContext &Ctx) {
-+  return new SIMCCodeEmitter(MCII, MRI, STI, Ctx);
++  if (Imm.F == 1.0f)
++    return 242;
++
++  if (Imm.F == -1.0f)
++    return 243;
++
++  if (Imm.F == 2.0f)
++    return 244;
++
++  if (Imm.F == -2.0f)
++    return 245;
++
++  if (Imm.F == 4.0f)
++    return 246;
++
++  if (Imm.F == 4.0f)
++    return 247;
++
++  return 255;
 +}
 +
 +void SIMCCodeEmitter::EncodeInstruction(const MCInst &MI, raw_ostream &OS,
 +                                       SmallVectorImpl<MCFixup> &Fixups) const {
++
 +  uint64_t Encoding = getBinaryCodeForInstr(MI, Fixups);
-+  unsigned bytes = getEncodingBytes(MI);
++  const MCInstrDesc &Desc = MCII.get(MI.getOpcode());
++  unsigned bytes = Desc.getSize();
++
 +  for (unsigned i = 0; i < bytes; i++) {
 +    OS.write((uint8_t) ((Encoding >> (8 * i)) & 0xff));
 +  }
++
++  if (bytes > 4)
++    return;
++
++  // Check for additional literals in SRC0/1/2 (Op 1/2/3)
++  for (unsigned i = 0, e = MI.getNumOperands(); i < e; ++i) {
++
++    // Check if this operand should be encoded as [SV]Src
++    if (!isSrcOperand(Desc, i))
++      continue;
++
++    // Is this operand a literal immediate?
++    const MCOperand &Op = MI.getOperand(i);
++    if (getLitEncoding(Op) != 255)
++      continue;
++
++    // Yes! Encode it
++    IntFloatUnion Imm;
++    if (Op.isImm())
++      Imm.I = Op.getImm();
++    else
++      Imm.F = Op.getFPImm();
++
++    for (unsigned j = 0; j < 4; j++) {
++      OS.write((uint8_t) ((Imm.I >> (8 * j)) & 0xff));
++    }
++
++    // Only one literal value allowed
++    break;
++  }
 +}
 +
 +uint64_t SIMCCodeEmitter::getMachineOpValue(const MCInst &MI,
 +                                            const MCOperand &MO,
 +                                       SmallVectorImpl<MCFixup> &Fixups) const {
-+  if (MO.isReg()) {
-+    return getRegBinaryCode(MO.getReg());
-+  } else if (MO.isImm()) {
-+    return MO.getImm();
-+  } else if (MO.isFPImm()) {
-+    // XXX: Not all instructions can use inline literals
-+    // XXX: We should make sure this is a 32-bit constant
-+    union {
-+      float F;
-+      uint32_t I;
-+    } Imm;
-+    Imm.F = MO.getFPImm();
-+    return Imm.I;
-+  } else if (MO.isExpr()) {
++  if (MO.isReg())
++    return MRI.getEncodingValue(MO.getReg());
++
++  if (MO.isExpr()) {
 +    const MCExpr *Expr = MO.getExpr();
 +    MCFixupKind Kind = MCFixupKind(FK_PCRel_4);
 +    Fixups.push_back(MCFixup::Create(0, Expr, Kind, MI.getLoc()));
 +    return 0;
-+  } else{
-+    llvm_unreachable("Encoding of this operand type is not supported yet.");
 +  }
++
++  // Figure out the operand number, needed for isSrcOperand check
++  unsigned OpNo = 0;
++  for (unsigned e = MI.getNumOperands(); OpNo < e; ++OpNo) {
++    if (&MO == &MI.getOperand(OpNo))
++      break;
++  }
++
++  const MCInstrDesc &Desc = MCII.get(MI.getOpcode());
++  if (isSrcOperand(Desc, OpNo)) {
++    uint32_t Enc = getLitEncoding(MO);
++    if (Enc != ~0U && (Enc != 255 || Desc.getSize() == 4))
++      return Enc;
++
++  } else if (MO.isImm())
++    return MO.getImm();
++
++  llvm_unreachable("Encoding of this operand type is not supported yet.");
 +  return 0;
 +}
 +
@@ -12936,10 +13447,10 @@ index 0000000..c47dc99
 +
 +unsigned SIMCCodeEmitter::GPRAlign(const MCInst &MI, unsigned OpNo,
 +                                   unsigned shift) const {
-+  unsigned regCode = getRegBinaryCode(MI.getOperand(OpNo).getReg());
-+  return regCode >> shift;
-+  return 0;
++  unsigned regCode = MRI.getEncodingValue(MI.getOperand(OpNo).getReg());
++  return (regCode & 0xff) >> shift;
 +}
++
 +unsigned SIMCCodeEmitter::GPR2AlignEncode(const MCInst &MI,
 +                                          unsigned OpNo ,
 +                                        SmallVectorImpl<MCFixup> &Fixup) const {
@@ -12951,120 +13462,6 @@ index 0000000..c47dc99
 +                                        SmallVectorImpl<MCFixup> &Fixup) const {
 +  return GPRAlign(MI, OpNo, 2);
 +}
-+
-+#define SMRD_OFFSET_MASK 0xff
-+#define SMRD_IMM_SHIFT 8
-+#define SMRD_SBASE_MASK 0x3f
-+#define SMRD_SBASE_SHIFT 9
-+/// This function is responsibe for encoding the offset
-+/// and the base ptr for SMRD instructions it should return a bit string in
-+/// this format:
-+///
-+/// OFFSET = bits{7-0}
-+/// IMM    = bits{8}
-+/// SBASE  = bits{14-9}
-+///
-+uint32_t SIMCCodeEmitter::SMRDmemriEncode(const MCInst &MI, unsigned OpNo,
-+                                        SmallVectorImpl<MCFixup> &Fixup) const {
-+  uint32_t Encoding;
-+
-+  const MCOperand &OffsetOp = MI.getOperand(OpNo + 1);
-+
-+  //XXX: Use this function for SMRD loads with register offsets
-+  assert(OffsetOp.isImm());
-+
-+  Encoding =
-+      (getMachineOpValue(MI, OffsetOp, Fixup) & SMRD_OFFSET_MASK)
-+    | (1 << SMRD_IMM_SHIFT) //XXX If the Offset is a register we shouldn't set this bit
-+    | ((GPR2AlignEncode(MI, OpNo, Fixup) & SMRD_SBASE_MASK) << SMRD_SBASE_SHIFT)
-+    ;
-+
-+  return Encoding;
-+}
-+
-+//===----------------------------------------------------------------------===//
-+// Post Encoder Callbacks
-+//===----------------------------------------------------------------------===//
-+
-+uint64_t SIMCCodeEmitter::VOPPostEncode(const MCInst &MI, uint64_t Value) const{
-+  unsigned encodingType = getEncodingType(MI);
-+  unsigned numSrcOps;
-+  unsigned vgprBitOffset;
-+
-+  if (encodingType == SIInstrEncodingType::VOP3) {
-+    numSrcOps = 3;
-+    vgprBitOffset = 32;
-+  } else {
-+    numSrcOps = 1;
-+    vgprBitOffset = 0;
-+  }
-+
-+  // Add one to skip over the destination reg operand.
-+  for (unsigned opIdx = 1; opIdx < numSrcOps + 1; opIdx++) {
-+    const MCOperand &MO = MI.getOperand(opIdx);
-+    if (MO.isReg()) {
-+      unsigned reg = MI.getOperand(opIdx).getReg();
-+      if (AMDGPUMCRegisterClasses[AMDGPU::VReg_32RegClassID].contains(reg) ||
-+          AMDGPUMCRegisterClasses[AMDGPU::VReg_64RegClassID].contains(reg)) {
-+        Value |= (VGPR_BIT(opIdx)) << vgprBitOffset;
-+      }
-+    } else if (MO.isFPImm()) {
-+      union {
-+        float f;
-+        uint32_t i;
-+      } Imm;
-+      // XXX: Not all instructions can use inline literals
-+      // XXX: We should make sure this is a 32-bit constant
-+      Imm.f = MO.getFPImm();
-+      Value |= ((uint64_t)Imm.i) << 32;
-+    }
-+  }
-+  return Value;
-+}
-+
-+//===----------------------------------------------------------------------===//
-+// Encoding helper functions
-+//===----------------------------------------------------------------------===//
-+
-+unsigned SIMCCodeEmitter::getEncodingType(const MCInst &MI) const {
-+  return MCII.get(MI.getOpcode()).TSFlags & SI_INSTR_FLAGS_ENCODING_MASK;
-+}
-+
-+unsigned SIMCCodeEmitter::getEncodingBytes(const MCInst &MI) const {
-+
-+  // These instructions aren't real instructions with an encoding type, so
-+  // we need to manually specify their size.
-+  switch (MI.getOpcode()) {
-+  default: break;
-+  case AMDGPU::SI_LOAD_LITERAL_I32:
-+  case AMDGPU::SI_LOAD_LITERAL_F32:
-+    return 4;
-+  }
-+
-+  unsigned encoding_type = getEncodingType(MI);
-+  switch (encoding_type) {
-+    case SIInstrEncodingType::EXP:
-+    case SIInstrEncodingType::LDS:
-+    case SIInstrEncodingType::MUBUF:
-+    case SIInstrEncodingType::MTBUF:
-+    case SIInstrEncodingType::MIMG:
-+    case SIInstrEncodingType::VOP3:
-+      return 8;
-+    default:
-+      return 4;
-+  }
-+}
-+
-+
-+unsigned SIMCCodeEmitter::getRegBinaryCode(unsigned reg) const {
-+  switch (reg) {
-+    case AMDGPU::M0: return 124;
-+    case AMDGPU::SREG_LIT_0: return 128;
-+    case AMDGPU::SI_LITERAL_CONSTANT: return 255;
-+    default: return MRI.getEncodingValue(reg);
-+  }
-+}
-+
 diff --git a/lib/Target/R600/Makefile b/lib/Target/R600/Makefile
 new file mode 100644
 index 0000000..1b3ebbe
@@ -13096,10 +13493,10 @@ index 0000000..1b3ebbe
 +include $(LEVEL)/Makefile.common
 diff --git a/lib/Target/R600/Processors.td b/lib/Target/R600/Processors.td
 new file mode 100644
-index 0000000..3dc1ecd
+index 0000000..868810c
 --- /dev/null
 +++ b/lib/Target/R600/Processors.td
-@@ -0,0 +1,29 @@
+@@ -0,0 +1,30 @@
 +//===-- Processors.td - TODO: Add brief description -------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -13115,6 +13512,7 @@ index 0000000..3dc1ecd
 +
 +class Proc<string Name, ProcessorItineraries itin, list<SubtargetFeature> Features>
 +: Processor<Name, itin, Features>;
++def : Proc<"",           R600_EG_Itin, [FeatureR600ALUInst]>;
 +def : Proc<"r600",       R600_EG_Itin, [FeatureR600ALUInst]>;
 +def : Proc<"rv710",      R600_EG_Itin, []>;
 +def : Proc<"rv730",      R600_EG_Itin, []>;
@@ -13131,10 +13529,10 @@ index 0000000..3dc1ecd
 +
 diff --git a/lib/Target/R600/R600Defines.h b/lib/Target/R600/R600Defines.h
 new file mode 100644
-index 0000000..7dea8e4
+index 0000000..16cfcf5
 --- /dev/null
 +++ b/lib/Target/R600/R600Defines.h
-@@ -0,0 +1,79 @@
+@@ -0,0 +1,97 @@
 +//===-- R600Defines.h - R600 Helper Macros ----------------------*- C++ -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -13186,6 +13584,9 @@ index 0000000..7dea8e4
 +#define HW_REG_MASK 0x1ff
 +#define HW_CHAN_SHIFT 9
 +
++#define GET_REG_CHAN(reg) ((reg) >> HW_CHAN_SHIFT)
++#define GET_REG_INDEX(reg) ((reg) & HW_REG_MASK)
++
 +namespace R600Operands {
 +  enum Ops {
 +    DST,
@@ -13199,27 +13600,42 @@ index 0000000..7dea8e4
 +    SRC0_NEG,
 +    SRC0_REL,
 +    SRC0_ABS,
++    SRC0_SEL,
 +    SRC1,
 +    SRC1_NEG,
 +    SRC1_REL,
 +    SRC1_ABS,
++    SRC1_SEL,
 +    SRC2,
 +    SRC2_NEG,
 +    SRC2_REL,
++    SRC2_SEL,
 +    LAST,
 +    PRED_SEL,
 +    IMM,
 +    COUNT
 + };
++
++  const static int ALUOpTable[3][R600Operands::COUNT] = {
++//            W        C     S  S  S  S     S  S  S  S     S  S  S
++//            R  O  D  L  S  R  R  R  R  S  R  R  R  R  S  R  R  R  L  P
++//   D  U     I  M  R  A  R  C  C  C  C  R  C  C  C  C  R  C  C  C  A  R  I
++//   S  E  U  T  O  E  M  C  0  0  0  0  C  1  1  1  1  C  2  2  2  S  E  M
++//   T  M  P  E  D  L  P  0  N  R  A  S  1  N  R  A  S  2  N  R  S  T  D  M
++    {0,-1,-1, 1, 2, 3, 4, 5, 6, 7, 8, 9,-1,-1,-1,-1,-1,-1,-1,-1,-1,10,11,12},
++    {0, 1, 2, 3, 4 ,5 ,6 ,7, 8, 9,10,11,12,13,14,15,16,-1,-1,-1,-1,17,18,19},
++    {0,-1,-1,-1,-1, 1, 2, 3, 4, 5,-1, 6, 7, 8, 9,-1,10,11,12,13,14,15,16,17}
++  };
++
 +}
 +
 +#endif // R600DEFINES_H_
 diff --git a/lib/Target/R600/R600ExpandSpecialInstrs.cpp b/lib/Target/R600/R600ExpandSpecialInstrs.cpp
 new file mode 100644
-index 0000000..b6e62b7
+index 0000000..c00c349
 --- /dev/null
 +++ b/lib/Target/R600/R600ExpandSpecialInstrs.cpp
-@@ -0,0 +1,334 @@
+@@ -0,0 +1,290 @@
 +//===-- R600ExpandSpecialInstrs.cpp - Expand special instructions ---------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -13277,118 +13693,6 @@ index 0000000..b6e62b7
 +  return new R600ExpandSpecialInstrsPass(TM);
 +}
 +
-+bool R600ExpandSpecialInstrsPass::ExpandInputPerspective(MachineInstr &MI) {
-+  const R600RegisterInfo &TRI = TII->getRegisterInfo();
-+  if (MI.getOpcode() != AMDGPU::input_perspective)
-+    return false;
-+
-+  MachineBasicBlock::iterator I = &MI;
-+  unsigned DstReg = MI.getOperand(0).getReg();
-+  R600MachineFunctionInfo *MFI = MI.getParent()->getParent()
-+      ->getInfo<R600MachineFunctionInfo>();
-+  unsigned IJIndexBase;
-+
-+  // In Evergreen ISA doc section 8.3.2 :
-+  // We need to interpolate XY and ZW in two different instruction groups.
-+  // An INTERP_* must occupy all 4 slots of an instruction group.
-+  // Output of INTERP_XY is written in X,Y slots
-+  // Output of INTERP_ZW is written in Z,W slots
-+  //
-+  // Thus interpolation requires the following sequences :
-+  //
-+  // AnyGPR.x = INTERP_ZW; (Write Masked Out)
-+  // AnyGPR.y = INTERP_ZW; (Write Masked Out)
-+  // DstGPR.z = INTERP_ZW;
-+  // DstGPR.w = INTERP_ZW; (End of first IG)
-+  // DstGPR.x = INTERP_XY;
-+  // DstGPR.y = INTERP_XY;
-+  // AnyGPR.z = INTERP_XY; (Write Masked Out)
-+  // AnyGPR.w = INTERP_XY; (Write Masked Out) (End of second IG)
-+  //
-+  switch (MI.getOperand(1).getImm()) {
-+  case 0:
-+    IJIndexBase = MFI->GetIJPerspectiveIndex();
-+    break;
-+  case 1:
-+    IJIndexBase = MFI->GetIJLinearIndex();
-+    break;
-+  default:
-+    assert(0 && "Unknow ij index");
-+  }
-+
-+  for (unsigned i = 0; i < 8; i++) {
-+    unsigned IJIndex = AMDGPU::R600_TReg32RegClass.getRegister(
-+        2 * IJIndexBase + ((i + 1) % 2));
-+    unsigned ReadReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
-+        MI.getOperand(2).getImm());
-+
-+
-+    unsigned Sel = AMDGPU::sel_x;
-+    switch (i % 4) {
-+    case 0:Sel = AMDGPU::sel_x;break;
-+    case 1:Sel = AMDGPU::sel_y;break;
-+    case 2:Sel = AMDGPU::sel_z;break;
-+    case 3:Sel = AMDGPU::sel_w;break;
-+    default:break;
-+    }
-+
-+    unsigned Res = TRI.getSubReg(DstReg, Sel);
-+
-+    unsigned Opcode = (i < 4)?AMDGPU::INTERP_ZW:AMDGPU::INTERP_XY;
-+
-+    MachineBasicBlock &MBB = *(MI.getParent());
-+    MachineInstr *NewMI =
-+        TII->buildDefaultInstruction(MBB, I, Opcode, Res, IJIndex, ReadReg);
-+
-+    if (!(i> 1 && i < 6)) {
-+      TII->addFlag(NewMI, 0, MO_FLAG_MASK);
-+    }
-+
-+    if (i % 4 !=  3)
-+      TII->addFlag(NewMI, 0, MO_FLAG_NOT_LAST);
-+  }
-+
-+  MI.eraseFromParent();
-+
-+  return true;
-+}
-+
-+bool R600ExpandSpecialInstrsPass::ExpandInputConstant(MachineInstr &MI) {
-+  const R600RegisterInfo &TRI = TII->getRegisterInfo();
-+  if (MI.getOpcode() != AMDGPU::input_constant)
-+    return false;
-+
-+  MachineBasicBlock::iterator I = &MI;
-+  unsigned DstReg = MI.getOperand(0).getReg();
-+
-+  for (unsigned i = 0; i < 4; i++) {
-+    unsigned ReadReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
-+        MI.getOperand(1).getImm());
-+
-+    unsigned Sel = AMDGPU::sel_x;
-+    switch (i % 4) {
-+    case 0:Sel = AMDGPU::sel_x;break;
-+    case 1:Sel = AMDGPU::sel_y;break;
-+    case 2:Sel = AMDGPU::sel_z;break;
-+    case 3:Sel = AMDGPU::sel_w;break;
-+    default:break;
-+    }
-+
-+    unsigned Res = TRI.getSubReg(DstReg, Sel);
-+
-+    MachineBasicBlock &MBB = *(MI.getParent());
-+    MachineInstr *NewMI = TII->buildDefaultInstruction(
-+        MBB, I, AMDGPU::INTERP_LOAD_P0, Res, ReadReg);
-+
-+    if (i % 4 !=  3)
-+      TII->addFlag(NewMI, 0, MO_FLAG_NOT_LAST);
-+  }
-+
-+  MI.eraseFromParent();
-+
-+  return true;
-+}
-+
 +bool R600ExpandSpecialInstrsPass::runOnMachineFunction(MachineFunction &MF) {
 +
 +  const R600RegisterInfo &TRI = TII->getRegisterInfo();
@@ -13422,7 +13726,7 @@ index 0000000..b6e62b7
 +        MI.eraseFromParent();
 +        continue;
 +        }
-+      case AMDGPU::BREAK:
++      case AMDGPU::BREAK: {
 +        MachineInstr *PredSet = TII->buildDefaultInstruction(MBB, I,
 +                                          AMDGPU::PRED_SETE_INT,
 +                                          AMDGPU::PREDICATE_BIT,
@@ -13436,12 +13740,81 @@ index 0000000..b6e62b7
 +                .addReg(AMDGPU::PREDICATE_BIT);
 +        MI.eraseFromParent();
 +        continue;
-+    }
++        }
 +
-+    if (ExpandInputPerspective(MI))
-+      continue;
-+    if (ExpandInputConstant(MI))
-+      continue;
++      case AMDGPU::INTERP_PAIR_XY: {
++        MachineInstr *BMI;
++        unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
++                MI.getOperand(2).getImm());
++
++        for (unsigned Chan = 0; Chan < 4; ++Chan) {
++          unsigned DstReg;
++
++          if (Chan < 2)
++            DstReg = MI.getOperand(Chan).getReg();
++          else
++            DstReg = Chan == 2 ? AMDGPU::T0_Z : AMDGPU::T0_W;
++
++          BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_XY,
++              DstReg, MI.getOperand(3 + (Chan % 2)).getReg(), PReg);
++
++          BMI->setIsInsideBundle(Chan > 0);
++          if (Chan >= 2)
++            TII->addFlag(BMI, 0, MO_FLAG_MASK);
++          if (Chan != 3)
++            TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST);
++        }
++
++        MI.eraseFromParent();
++        continue;
++        }
++
++      case AMDGPU::INTERP_PAIR_ZW: {
++        MachineInstr *BMI;
++        unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
++                MI.getOperand(2).getImm());
++
++        for (unsigned Chan = 0; Chan < 4; ++Chan) {
++          unsigned DstReg;
++
++          if (Chan < 2)
++            DstReg = Chan == 0 ? AMDGPU::T0_X : AMDGPU::T0_Y;
++          else
++            DstReg = MI.getOperand(Chan-2).getReg();
++
++          BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_ZW,
++              DstReg, MI.getOperand(3 + (Chan % 2)).getReg(), PReg);
++
++          BMI->setIsInsideBundle(Chan > 0);
++          if (Chan < 2)
++            TII->addFlag(BMI, 0, MO_FLAG_MASK);
++          if (Chan != 3)
++            TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST);
++        }
++
++        MI.eraseFromParent();
++        continue;
++        }
++
++      case AMDGPU::INTERP_VEC_LOAD: {
++        const R600RegisterInfo &TRI = TII->getRegisterInfo();
++        MachineInstr *BMI;
++        unsigned PReg = AMDGPU::R600_ArrayBaseRegClass.getRegister(
++                MI.getOperand(1).getImm());
++        unsigned DstReg = MI.getOperand(0).getReg();
++
++        for (unsigned Chan = 0; Chan < 4; ++Chan) {
++          BMI = TII->buildDefaultInstruction(MBB, I, AMDGPU::INTERP_LOAD_P0,
++              TRI.getSubReg(DstReg, TRI.getSubRegFromChannel(Chan)), PReg);
++          BMI->setIsInsideBundle(Chan > 0);
++          if (Chan != 3)
++            TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST);
++        }
++
++        MI.eraseFromParent();
++        continue;
++        }
++      }
 +
 +      bool IsReduction = TII->isReductionOp(MI.getOpcode());
 +      bool IsVector = TII->isVector(MI);
@@ -13540,8 +13913,7 @@ index 0000000..b6e62b7
 +        MachineInstr *NewMI =
 +          TII->buildDefaultInstruction(MBB, I, Opcode, DstReg, Src0, Src1);
 +
-+        if (Chan != 0)
-+          NewMI->bundleWithPred();
++        NewMI->setIsInsideBundle(Chan != 0);
 +        if (Mask) {
 +          TII->addFlag(NewMI, 0, MO_FLAG_MASK);
 +        }
@@ -13556,10 +13928,10 @@ index 0000000..b6e62b7
 +}
 diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp
 new file mode 100644
-index 0000000..d6b9d90
+index 0000000..9c38522
 --- /dev/null
 +++ b/lib/Target/R600/R600ISelLowering.cpp
-@@ -0,0 +1,909 @@
+@@ -0,0 +1,1195 @@
 +//===-- R600ISelLowering.cpp - R600 DAG Lowering Implementation -----------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -13580,6 +13952,7 @@ index 0000000..d6b9d90
 +#include "R600MachineFunctionInfo.h"
 +#include "llvm/Argument.h"
 +#include "llvm/Function.h"
++#include "llvm/CodeGen/MachineFrameInfo.h"
 +#include "llvm/CodeGen/MachineInstrBuilder.h"
 +#include "llvm/CodeGen/MachineRegisterInfo.h"
 +#include "llvm/CodeGen/SelectionDAG.h"
@@ -13633,10 +14006,27 @@ index 0000000..d6b9d90
 +  setOperationAction(ISD::SELECT, MVT::i32, Custom);
 +  setOperationAction(ISD::SELECT, MVT::f32, Custom);
 +
++  // Legalize loads and stores to the private address space.
++  setOperationAction(ISD::LOAD, MVT::i32, Custom);
++  setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
++  setOperationAction(ISD::LOAD, MVT::v4i32, Custom);
++  setLoadExtAction(ISD::EXTLOAD, MVT::v4i8, Custom);
++  setLoadExtAction(ISD::EXTLOAD, MVT::i8, Custom);
++  setLoadExtAction(ISD::ZEXTLOAD, MVT::i8, Custom);
++  setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i8, Custom);
++  setOperationAction(ISD::STORE, MVT::i8, Custom);
 +  setOperationAction(ISD::STORE, MVT::i32, Custom);
++  setOperationAction(ISD::STORE, MVT::v2i32, Custom);
 +  setOperationAction(ISD::STORE, MVT::v4i32, Custom);
 +
++  setOperationAction(ISD::LOAD, MVT::i32, Custom);
++  setOperationAction(ISD::LOAD, MVT::v4i32, Custom);
++  setOperationAction(ISD::FrameIndex, MVT::i32, Custom);
++
 +  setTargetDAGCombine(ISD::FP_ROUND);
++  setTargetDAGCombine(ISD::FP_TO_SINT);
++  setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);
++  setTargetDAGCombine(ISD::SELECT_CC);
 +
 +  setSchedulingPreference(Sched::VLIW);
 +}
@@ -13677,15 +14067,6 @@ index 0000000..d6b9d90
 +    break;
 +  }
 +
-+  case AMDGPU::R600_LOAD_CONST: {
-+    int64_t RegIndex = MI->getOperand(1).getImm();
-+    unsigned ConstantReg = AMDGPU::R600_CReg32RegClass.getRegister(RegIndex);
-+    BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::COPY))
-+                .addOperand(MI->getOperand(0))
-+                .addReg(ConstantReg);
-+    break;
-+  }
-+
 +  case AMDGPU::MASK_WRITE: {
 +    unsigned maskedRegister = MI->getOperand(0).getReg();
 +    assert(TargetRegisterInfo::isVirtualRegister(maskedRegister));
@@ -13716,18 +14097,6 @@ index 0000000..d6b9d90
 +    break;
 +  }
 +
-+  case AMDGPU::RESERVE_REG: {
-+    R600MachineFunctionInfo * MFI = MF->getInfo<R600MachineFunctionInfo>();
-+    int64_t ReservedIndex = MI->getOperand(0).getImm();
-+    unsigned ReservedReg =
-+                         AMDGPU::R600_TReg32RegClass.getRegister(ReservedIndex);
-+    MFI->ReservedRegs.push_back(ReservedReg);
-+    unsigned SuperReg =
-+          AMDGPU::R600_Reg128RegClass.getRegister(ReservedIndex / 4);
-+    MFI->ReservedRegs.push_back(SuperReg);
-+    break;
-+  }
-+
 +  case AMDGPU::TXD: {
 +    unsigned T0 = MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass);
 +    unsigned T1 = MRI.createVirtualRegister(&AMDGPU::R600_Reg128RegClass);
@@ -13812,33 +14181,26 @@ index 0000000..d6b9d90
 +    break;
 +  }
 +
-+  case AMDGPU::input_perspective: {
-+    R600MachineFunctionInfo *MFI = MF->getInfo<R600MachineFunctionInfo>();
-+
-+    // XXX Be more fine about register reservation
-+    for (unsigned i = 0; i < 4; i ++) {
-+      unsigned ReservedReg = AMDGPU::R600_TReg32RegClass.getRegister(i);
-+      MFI->ReservedRegs.push_back(ReservedReg);
-+    }
-+
-+    switch (MI->getOperand(1).getImm()) {
-+    case 0:// Perspective
-+      MFI->HasPerspectiveInterpolation = true;
-+      break;
-+    case 1:// Linear
-+      MFI->HasLinearInterpolation = true;
-+      break;
-+    default:
-+      assert(0 && "Unknow ij index");
-+    }
-+
-+    return BB;
-+  }
-+
 +  case AMDGPU::EG_ExportSwz:
 +  case AMDGPU::R600_ExportSwz: {
++    // Instruction is left unmodified if its not the last one of its type
++    bool isLastInstructionOfItsType = true;
++    unsigned InstExportType = MI->getOperand(1).getImm();
++    for (MachineBasicBlock::iterator NextExportInst = llvm::next(I),
++         EndBlock = BB->end(); NextExportInst != EndBlock;
++         NextExportInst = llvm::next(NextExportInst)) {
++      if (NextExportInst->getOpcode() == AMDGPU::EG_ExportSwz ||
++          NextExportInst->getOpcode() == AMDGPU::R600_ExportSwz) {
++        unsigned CurrentInstExportType = NextExportInst->getOperand(1)
++            .getImm();
++        if (CurrentInstExportType == InstExportType) {
++          isLastInstructionOfItsType = false;
++          break;
++        }
++      }
++    }
 +    bool EOP = (llvm::next(I)->getOpcode() == AMDGPU::RETURN)? 1 : 0;
-+    if (!EOP)
++    if (!EOP && !isLastInstructionOfItsType)
 +      return BB;
 +    unsigned CfInst = (MI->getOpcode() == AMDGPU::EG_ExportSwz)? 84 : 40;
 +    BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI->getOpcode()))
@@ -13850,7 +14212,7 @@ index 0000000..d6b9d90
 +            .addOperand(MI->getOperand(5))
 +            .addOperand(MI->getOperand(6))
 +            .addImm(CfInst)
-+            .addImm(1);
++            .addImm(EOP);
 +    break;
 +  }
 +  }
@@ -13926,7 +14288,9 @@ index 0000000..d6b9d90
 +  case ISD::SELECT: return LowerSELECT(Op, DAG);
 +  case ISD::SETCC: return LowerSETCC(Op, DAG);
 +  case ISD::STORE: return LowerSTORE(Op, DAG);
++  case ISD::LOAD: return LowerLOAD(Op, DAG);
 +  case ISD::FPOW: return LowerFPOW(Op, DAG);
++  case ISD::FrameIndex: return LowerFrameIndex(Op, DAG);
 +  case ISD::INTRINSIC_VOID: {
 +    SDValue Chain = Op.getOperand(0);
 +    unsigned IntrinsicID =
@@ -13953,39 +14317,7 @@ index 0000000..d6b9d90
 +          Chain);
 +
 +    }
-+    case AMDGPUIntrinsic::R600_store_stream_output : {
-+      MachineFunction &MF = DAG.getMachineFunction();
-+      R600MachineFunctionInfo *MFI = MF.getInfo<R600MachineFunctionInfo>();
-+      int64_t RegIndex = cast<ConstantSDNode>(Op.getOperand(3))->getZExtValue();
-+      int64_t BufIndex = cast<ConstantSDNode>(Op.getOperand(4))->getZExtValue();
-+
-+      SDNode **OutputsMap = MFI->StreamOutputs[BufIndex];
-+      unsigned Inst;
-+      switch (cast<ConstantSDNode>(Op.getOperand(4))->getZExtValue()  ) {
-+      // STREAM3
-+      case 3:
-+        Inst = 4;
-+        break;
-+      // STREAM2
-+      case 2:
-+        Inst = 3;
-+        break;
-+      // STREAM1
-+      case 1:
-+        Inst = 2;
-+        break;
-+      // STREAM0
-+      case 0:
-+        Inst = 1;
-+        break;
-+      default:
-+        assert(0 && "Wrong buffer id for stream outputs !");
-+      }
 +
-+      return InsertScalarToRegisterExport(DAG, Op.getDebugLoc(), OutputsMap,
-+          RegIndex / 4, RegIndex % 4, Inst, 0, Op.getOperand(2),
-+          Chain);
-+    }
 +    // default for switch(IntrinsicID)
 +    default: break;
 +    }
@@ -14004,38 +14336,35 @@ index 0000000..d6b9d90
 +      unsigned Reg = AMDGPU::R600_TReg32RegClass.getRegister(RegIndex);
 +      return CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass, Reg, VT);
 +    }
-+    case AMDGPUIntrinsic::R600_load_input_perspective: {
-+      int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
-+      if (slot < 0)
-+        return DAG.getUNDEF(MVT::f32);
-+      SDValue FullVector = DAG.getNode(
-+          AMDGPUISD::INTERP,
-+          DL, MVT::v4f32,
-+          DAG.getConstant(0, MVT::i32), DAG.getConstant(slot / 4 , MVT::i32));
-+      return DAG.getNode(ISD::EXTRACT_VECTOR_ELT,
-+        DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32));
-+    }
-+    case AMDGPUIntrinsic::R600_load_input_linear: {
-+      int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
-+      if (slot < 0)
-+        return DAG.getUNDEF(MVT::f32);
-+      SDValue FullVector = DAG.getNode(
-+        AMDGPUISD::INTERP,
-+        DL, MVT::v4f32,
-+        DAG.getConstant(1, MVT::i32), DAG.getConstant(slot / 4 , MVT::i32));
-+      return DAG.getNode(ISD::EXTRACT_VECTOR_ELT,
-+        DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32));
-+    }
-+    case AMDGPUIntrinsic::R600_load_input_constant: {
++
++    case AMDGPUIntrinsic::R600_interp_input: {
 +      int slot = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
-+      if (slot < 0)
-+        return DAG.getUNDEF(MVT::f32);
-+      SDValue FullVector = DAG.getNode(
-+        AMDGPUISD::INTERP_P0,
-+        DL, MVT::v4f32,
-+        DAG.getConstant(slot / 4 , MVT::i32));
-+      return DAG.getNode(ISD::EXTRACT_VECTOR_ELT,
-+          DL, VT, FullVector, DAG.getConstant(slot % 4, MVT::i32));
++      int ijb = cast<ConstantSDNode>(Op.getOperand(2))->getSExtValue();
++      MachineSDNode *interp;
++      if (ijb < 0) {
++        interp = DAG.getMachineNode(AMDGPU::INTERP_VEC_LOAD, DL,
++            MVT::v4f32, DAG.getTargetConstant(slot / 4 , MVT::i32));
++        return DAG.getTargetExtractSubreg(
++            TII->getRegisterInfo().getSubRegFromChannel(slot % 4),
++            DL, MVT::f32, SDValue(interp, 0));
++      }
++
++      if (slot % 4 < 2)
++        interp = DAG.getMachineNode(AMDGPU::INTERP_PAIR_XY, DL,
++            MVT::f32, MVT::f32, DAG.getTargetConstant(slot / 4 , MVT::i32),
++            CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
++                AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb + 1), MVT::f32),
++            CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
++                AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb), MVT::f32));
++      else
++        interp = DAG.getMachineNode(AMDGPU::INTERP_PAIR_ZW, DL,
++            MVT::f32, MVT::f32, DAG.getTargetConstant(slot / 4 , MVT::i32),
++            CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
++                AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb + 1), MVT::f32),
++            CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
++                AMDGPU::R600_TReg32RegClass.getRegister(2 * ijb), MVT::f32));
++
++      return SDValue(interp, slot % 2);
 +    }
 +
 +    case r600_read_ngroups_x:
@@ -14089,6 +14418,20 @@ index 0000000..d6b9d90
 +  switch (N->getOpcode()) {
 +  default: return;
 +  case ISD::FP_TO_UINT: Results.push_back(LowerFPTOUINT(N->getOperand(0), DAG));
++    return;
++  case ISD::LOAD: {
++    SDNode *Node = LowerLOAD(SDValue(N, 0), DAG).getNode();
++    Results.push_back(SDValue(Node, 0));
++    Results.push_back(SDValue(Node, 1));
++    // XXX: LLVM seems not to replace Chain Value inside CustomWidenLowerNode
++    // function
++    DAG.ReplaceAllUsesOfValueWith(SDValue(N,1), SDValue(Node, 1));
++    return;
++  }
++  case ISD::STORE:
++    SDNode *Node = LowerSTORE(SDValue(N, 0), DAG).getNode();
++    Results.push_back(SDValue(Node, 0));
++    return;
 +  }
 +}
 +
@@ -14156,6 +14499,20 @@ index 0000000..d6b9d90
 +                     false, false, false, 0);
 +}
 +
++SDValue R600TargetLowering::LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const {
++
++  MachineFunction &MF = DAG.getMachineFunction();
++  const AMDGPUFrameLowering *TFL =
++   static_cast<const AMDGPUFrameLowering*>(getTargetMachine().getFrameLowering());
++
++  FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(Op);
++  assert(FIN);
++
++  unsigned FrameIndex = FIN->getIndex();
++  unsigned Offset = TFL->getFrameIndexOffset(MF, FrameIndex);
++  return DAG.getConstant(Offset * 4 * TFL->getStackWidth(MF), MVT::i32);
++}
++
 +SDValue R600TargetLowering::LowerROTL(SDValue Op, SelectionDAG &DAG) const {
 +  DebugLoc DL = Op.getDebugLoc();
 +  EVT VT = Op.getValueType();
@@ -14242,9 +14599,12 @@ index 0000000..d6b9d90
 +  }
 +
 +  // Try to lower to a SET* instruction:
-+  // We need all the operands of SELECT_CC to have the same value type, so if
-+  // necessary we need to change True and False to be the same type as LHS and
-+  // RHS, and then convert the result of the select_cc back to the correct type.
++  //
++  // CompareVT == MVT::f32 and VT == MVT::i32 is supported by the hardware,
++  // but for the other case where CompareVT != VT, all operands of
++  // SELECT_CC need to have the same value type, so we need to change True and
++  // False to be the same type as LHS and RHS, and then convert the result of
++  // the select_cc back to the correct type.
 +
 +  // Move hardware True/False values to the correct operand.
 +  if (isHWTrueValue(False) && isHWFalseValue(True)) {
@@ -14254,32 +14614,17 @@ index 0000000..d6b9d90
 +  }
 +
 +  if (isHWTrueValue(True) && isHWFalseValue(False)) {
-+    if (CompareVT !=  VT) {
-+      if (VT == MVT::f32 && CompareVT == MVT::i32) {
-+        SDValue Boolean = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
-+            LHS, RHS,
-+            DAG.getConstant(-1, MVT::i32),
-+            DAG.getConstant(0, MVT::i32),
-+            CC);
-+        // Convert integer values of true (-1) and false (0) to fp values of
-+        // true (1.0f) and false (0.0f).
-+        SDValue LSB = DAG.getNode(ISD::AND, DL, MVT::i32, Boolean,
-+                                                  DAG.getConstant(1, MVT::i32));
-+        return DAG.getNode(ISD::UINT_TO_FP, DL, VT, LSB);
-+      } else if (VT == MVT::i32 && CompareVT == MVT::f32) {
-+        SDValue BoolAsFlt = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
-+            LHS, RHS,
-+            DAG.getConstantFP(1.0f, MVT::f32),
-+            DAG.getConstantFP(0.0f, MVT::f32),
-+            CC);
-+        // Convert fp values of true (1.0f) and false (0.0f) to integer values
-+        // of true (-1) and false (0).
-+        SDValue Neg = DAG.getNode(ISD::FNEG, DL, MVT::f32, BoolAsFlt);
-+        return DAG.getNode(ISD::FP_TO_SINT, DL, VT, Neg);
-+      } else {
-+        // I don't think there will be any other type pairings.
-+        assert(!"Unhandled operand type parings in SELECT_CC");
-+      }
++    if (CompareVT !=  VT && VT == MVT::f32 && CompareVT == MVT::i32) {
++      SDValue Boolean = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
++          LHS, RHS,
++          DAG.getConstant(-1, MVT::i32),
++          DAG.getConstant(0, MVT::i32),
++          CC);
++      // Convert integer values of true (-1) and false (0) to fp values of
++      // true (1.0f) and false (0.0f).
++      SDValue LSB = DAG.getNode(ISD::AND, DL, MVT::i32, Boolean,
++                                                DAG.getConstant(1, MVT::i32));
++      return DAG.getNode(ISD::UINT_TO_FP, DL, VT, LSB);
 +    } else {
 +      // This SELECT_CC is already legal.
 +      return DAG.getNode(ISD::SELECT_CC, DL, VT, LHS, RHS, True, False, CC);
@@ -14370,6 +14715,61 @@ index 0000000..d6b9d90
 +  return Cond;
 +}
 +
++/// LLVM generates byte-addresed pointers.  For indirect addressing, we need to
++/// convert these pointers to a register index.  Each register holds
++/// 16 bytes, (4 x 32bit sub-register), but we need to take into account the
++/// \p StackWidth, which tells us how many of the 4 sub-registrers will be used
++/// for indirect addressing.
++SDValue R600TargetLowering::stackPtrToRegIndex(SDValue Ptr,
++                                               unsigned StackWidth,
++                                               SelectionDAG &DAG) const {
++  unsigned SRLPad;
++  switch(StackWidth) {
++  case 1:
++    SRLPad = 2;
++    break;
++  case 2:
++    SRLPad = 3;
++    break;
++  case 4:
++    SRLPad = 4;
++    break;
++  default: llvm_unreachable("Invalid stack width");
++  }
++
++  return DAG.getNode(ISD::SRL, Ptr.getDebugLoc(), Ptr.getValueType(), Ptr,
++                     DAG.getConstant(SRLPad, MVT::i32));
++}
++
++void R600TargetLowering::getStackAddress(unsigned StackWidth,
++                                         unsigned ElemIdx,
++                                         unsigned &Channel,
++                                         unsigned &PtrIncr) const {
++  switch (StackWidth) {
++  default:
++  case 1:
++    Channel = 0;
++    if (ElemIdx > 0) {
++      PtrIncr = 1;
++    } else {
++      PtrIncr = 0;
++    }
++    break;
++  case 2:
++    Channel = ElemIdx % 2;
++    if (ElemIdx == 2) {
++      PtrIncr = 1;
++    } else {
++      PtrIncr = 0;
++    }
++    break;
++  case 4:
++    Channel = ElemIdx;
++    PtrIncr = 0;
++    break;
++  }
++}
++
 +SDValue R600TargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {
 +  DebugLoc DL = Op.getDebugLoc();
 +  StoreSDNode *StoreNode = cast<StoreSDNode>(Op);
@@ -14391,23 +14791,202 @@ index 0000000..d6b9d90
 +    }
 +    return Chain;
 +  }
-+  return SDValue();
-+}
 +
++  EVT ValueVT = Value.getValueType();
 +
-+SDValue R600TargetLowering::LowerFPOW(SDValue Op,
-+    SelectionDAG &DAG) const {
-+  DebugLoc DL = Op.getDebugLoc();
-+  EVT VT = Op.getValueType();
-+  SDValue LogBase = DAG.getNode(ISD::FLOG2, DL, VT, Op.getOperand(0));
-+  SDValue MulLogBase = DAG.getNode(ISD::FMUL, DL, VT, Op.getOperand(1), LogBase);
-+  return DAG.getNode(ISD::FEXP2, DL, VT, MulLogBase);
++  if (StoreNode->getAddressSpace() != AMDGPUAS::PRIVATE_ADDRESS) {
++    return SDValue();
++  }
++
++  // Lowering for indirect addressing
++
++  const MachineFunction &MF = DAG.getMachineFunction();
++  const AMDGPUFrameLowering *TFL = static_cast<const AMDGPUFrameLowering*>(
++                                         getTargetMachine().getFrameLowering());
++  unsigned StackWidth = TFL->getStackWidth(MF);
++
++  Ptr = stackPtrToRegIndex(Ptr, StackWidth, DAG);
++
++  if (ValueVT.isVector()) {
++    unsigned NumElemVT = ValueVT.getVectorNumElements();
++    EVT ElemVT = ValueVT.getVectorElementType();
++    SDValue Stores[4];
++
++    assert(NumElemVT >= StackWidth && "Stack width cannot be greater than "
++                                      "vector width in load");
++
++    for (unsigned i = 0; i < NumElemVT; ++i) {
++      unsigned Channel, PtrIncr;
++      getStackAddress(StackWidth, i, Channel, PtrIncr);
++      Ptr = DAG.getNode(ISD::ADD, DL, MVT::i32, Ptr,
++                        DAG.getConstant(PtrIncr, MVT::i32));
++      SDValue Elem = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ElemVT,
++                                 Value, DAG.getConstant(i, MVT::i32));
++
++      Stores[i] = DAG.getNode(AMDGPUISD::REGISTER_STORE, DL, MVT::Other,
++                              Chain, Elem, Ptr,
++                              DAG.getTargetConstant(Channel, MVT::i32));
++    }
++     Chain =  DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Stores, NumElemVT);
++   } else {
++    if (ValueVT == MVT::i8) {
++      Value = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Value);
++    }
++    Chain = DAG.getNode(AMDGPUISD::REGISTER_STORE, DL, MVT::Other, Chain, Value, Ptr,
++    DAG.getTargetConstant(0, MVT::i32)); // Channel 
++  }
++
++  return Chain;
 +}
 +
-+/// XXX Only kernel functions are supported, so we can assume for now that
-+/// every function is a kernel function, but in the future we should use
-+/// separate calling conventions for kernel and non-kernel functions.
-+SDValue R600TargetLowering::LowerFormalArguments(
++// return (512 + (kc_bank << 12)
++static int
++ConstantAddressBlock(unsigned AddressSpace) {
++  switch (AddressSpace) {
++  case AMDGPUAS::CONSTANT_BUFFER_0:
++    return 512;
++  case AMDGPUAS::CONSTANT_BUFFER_1:
++    return 512 + 4096;
++  case AMDGPUAS::CONSTANT_BUFFER_2:
++    return 512 + 4096 * 2;
++  case AMDGPUAS::CONSTANT_BUFFER_3:
++    return 512 + 4096 * 3;
++  case AMDGPUAS::CONSTANT_BUFFER_4:
++    return 512 + 4096 * 4;
++  case AMDGPUAS::CONSTANT_BUFFER_5:
++    return 512 + 4096 * 5;
++  case AMDGPUAS::CONSTANT_BUFFER_6:
++    return 512 + 4096 * 6;
++  case AMDGPUAS::CONSTANT_BUFFER_7:
++    return 512 + 4096 * 7;
++  case AMDGPUAS::CONSTANT_BUFFER_8:
++    return 512 + 4096 * 8;
++  case AMDGPUAS::CONSTANT_BUFFER_9:
++    return 512 + 4096 * 9;
++  case AMDGPUAS::CONSTANT_BUFFER_10:
++    return 512 + 4096 * 10;
++  case AMDGPUAS::CONSTANT_BUFFER_11:
++    return 512 + 4096 * 11;
++  case AMDGPUAS::CONSTANT_BUFFER_12:
++    return 512 + 4096 * 12;
++  case AMDGPUAS::CONSTANT_BUFFER_13:
++    return 512 + 4096 * 13;
++  case AMDGPUAS::CONSTANT_BUFFER_14:
++    return 512 + 4096 * 14;
++  case AMDGPUAS::CONSTANT_BUFFER_15:
++    return 512 + 4096 * 15;
++  default:
++    return -1;
++  }
++}
++
++SDValue R600TargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const
++{
++  EVT VT = Op.getValueType();
++  DebugLoc DL = Op.getDebugLoc();
++  LoadSDNode *LoadNode = cast<LoadSDNode>(Op);
++  SDValue Chain = Op.getOperand(0);
++  SDValue Ptr = Op.getOperand(1);
++  SDValue LoweredLoad;
++
++  int ConstantBlock = ConstantAddressBlock(LoadNode->getAddressSpace());
++  if (ConstantBlock > -1) {
++    SDValue Result;
++    if (dyn_cast<ConstantExpr>(LoadNode->getSrcValue()) ||
++        dyn_cast<Constant>(LoadNode->getSrcValue())) {
++      SDValue Slots[4];
++      for (unsigned i = 0; i < 4; i++) {
++        // We want Const position encoded with the following formula :
++        // (((512 + (kc_bank << 12) + const_index) << 2) + chan)
++        // const_index is Ptr computed by llvm using an alignment of 16.
++        // Thus we add (((512 + (kc_bank << 12)) + chan ) * 4 here and
++        // then div by 4 at the ISel step
++        SDValue NewPtr = DAG.getNode(ISD::ADD, DL, Ptr.getValueType(), Ptr,
++            DAG.getConstant(4 * i + ConstantBlock * 16, MVT::i32));
++        Slots[i] = DAG.getNode(AMDGPUISD::CONST_ADDRESS, DL, MVT::i32, NewPtr);
++      }
++      Result = DAG.getNode(ISD::BUILD_VECTOR, DL, MVT::v4i32, Slots, 4);
++    } else {
++      // non constant ptr cant be folded, keeps it as a v4f32 load
++      Result = DAG.getNode(AMDGPUISD::CONST_ADDRESS, DL, MVT::v4i32,
++          DAG.getNode(ISD::SRL, DL, MVT::i32, Ptr, DAG.getConstant(4, MVT::i32))
++          );
++    }
++
++    if (!VT.isVector()) {
++      Result = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i32, Result,
++          DAG.getConstant(0, MVT::i32));
++    }
++
++    SDValue MergedValues[2] = {
++        Result,
++        Chain
++    };
++    return DAG.getMergeValues(MergedValues, 2, DL);
++  }
++
++  if (LoadNode->getAddressSpace() != AMDGPUAS::PRIVATE_ADDRESS) {
++    return SDValue();
++  }
++
++  // Lowering for indirect addressing
++  const MachineFunction &MF = DAG.getMachineFunction();
++  const AMDGPUFrameLowering *TFL = static_cast<const AMDGPUFrameLowering*>(
++                                         getTargetMachine().getFrameLowering());
++  unsigned StackWidth = TFL->getStackWidth(MF);
++
++  Ptr = stackPtrToRegIndex(Ptr, StackWidth, DAG);
++
++  if (VT.isVector()) {
++    unsigned NumElemVT = VT.getVectorNumElements();
++    EVT ElemVT = VT.getVectorElementType();
++    SDValue Loads[4];
++
++    assert(NumElemVT >= StackWidth && "Stack width cannot be greater than "
++                                      "vector width in load");
++
++    for (unsigned i = 0; i < NumElemVT; ++i) {
++      unsigned Channel, PtrIncr;
++      getStackAddress(StackWidth, i, Channel, PtrIncr);
++      Ptr = DAG.getNode(ISD::ADD, DL, MVT::i32, Ptr,
++                        DAG.getConstant(PtrIncr, MVT::i32));
++      Loads[i] = DAG.getNode(AMDGPUISD::REGISTER_LOAD, DL, ElemVT,
++                             Chain, Ptr,
++                             DAG.getTargetConstant(Channel, MVT::i32),
++                             Op.getOperand(2));
++    }
++    for (unsigned i = NumElemVT; i < 4; ++i) {
++      Loads[i] = DAG.getUNDEF(ElemVT);
++    }
++    EVT TargetVT = EVT::getVectorVT(*DAG.getContext(), ElemVT, 4);
++    LoweredLoad = DAG.getNode(ISD::BUILD_VECTOR, DL, TargetVT, Loads, 4);
++  } else {
++    LoweredLoad = DAG.getNode(AMDGPUISD::REGISTER_LOAD, DL, VT,
++                              Chain, Ptr,
++                              DAG.getTargetConstant(0, MVT::i32), // Channel
++                              Op.getOperand(2));
++  }
++
++  SDValue Ops[2];
++  Ops[0] = LoweredLoad;
++  Ops[1] = Chain;
++
++  return DAG.getMergeValues(Ops, 2, DL);
++}
++
++SDValue R600TargetLowering::LowerFPOW(SDValue Op,
++    SelectionDAG &DAG) const {
++  DebugLoc DL = Op.getDebugLoc();
++  EVT VT = Op.getValueType();
++  SDValue LogBase = DAG.getNode(ISD::FLOG2, DL, VT, Op.getOperand(0));
++  SDValue MulLogBase = DAG.getNode(ISD::FMUL, DL, VT, Op.getOperand(1), LogBase);
++  return DAG.getNode(ISD::FEXP2, DL, VT, MulLogBase);
++}
++
++/// XXX Only kernel functions are supported, so we can assume for now that
++/// every function is a kernel function, but in the future we should use
++/// separate calling conventions for kernel and non-kernel functions.
++SDValue R600TargetLowering::LowerFormalArguments(
 +                                      SDValue Chain,
 +                                      CallingConv::ID CallConv,
 +                                      bool isVarArg,
@@ -14435,7 +15014,7 @@ index 0000000..d6b9d90
 +                                                    AMDGPUAS::PARAM_I_ADDRESS);
 +    SDValue Arg = DAG.getExtLoad(ISD::ZEXTLOAD, DL, VT, DAG.getRoot(),
 +                                DAG.getConstant(ParamOffsetBytes, MVT::i32),
-+                                       MachinePointerInfo(new Argument(PtrTy)),
++                                       MachinePointerInfo(UndefValue::get(PtrTy)),
 +                                       ArgVT, false, false, ArgBytes);
 +    InVals.push_back(Arg);
 +    ParamOffsetBytes += ArgBytes;
@@ -14466,15 +15045,94 @@ index 0000000..d6b9d90
 +      }
 +      break;
 +    }
++
++  // (i32 fp_to_sint (fneg (select_cc f32, f32, 1.0, 0.0 cc))) ->
++  // (i32 select_cc f32, f32, -1, 0 cc)
++  //
++  // Mesa's GLSL frontend generates the above pattern a lot and we can lower
++  // this to one of the SET*_DX10 instructions.
++  case ISD::FP_TO_SINT: {
++    SDValue FNeg = N->getOperand(0);
++    if (FNeg.getOpcode() != ISD::FNEG) {
++      return SDValue();
++    }
++    SDValue SelectCC = FNeg.getOperand(0);
++    if (SelectCC.getOpcode() != ISD::SELECT_CC ||
++        SelectCC.getOperand(0).getValueType() != MVT::f32 || // LHS
++        SelectCC.getOperand(2).getValueType() != MVT::f32 || // True
++        !isHWTrueValue(SelectCC.getOperand(2)) ||
++        !isHWFalseValue(SelectCC.getOperand(3))) {
++      return SDValue();
++    }
++
++    return DAG.getNode(ISD::SELECT_CC, N->getDebugLoc(), N->getValueType(0),
++                           SelectCC.getOperand(0), // LHS
++                           SelectCC.getOperand(1), // RHS
++                           DAG.getConstant(-1, MVT::i32), // True
++                           DAG.getConstant(0, MVT::i32),  // Flase
++                           SelectCC.getOperand(4)); // CC
++
++    break;
++  }
++  // Extract_vec (Build_vector) generated by custom lowering
++  // also needs to be customly combined
++  case ISD::EXTRACT_VECTOR_ELT: {
++    SDValue Arg = N->getOperand(0);
++    if (Arg.getOpcode() == ISD::BUILD_VECTOR) {
++      if (ConstantSDNode *Const = dyn_cast<ConstantSDNode>(N->getOperand(1))) {
++        unsigned Element = Const->getZExtValue();
++        return Arg->getOperand(Element);
++      }
++    }
++    if (Arg.getOpcode() == ISD::BITCAST &&
++        Arg.getOperand(0).getOpcode() == ISD::BUILD_VECTOR) {
++      if (ConstantSDNode *Const = dyn_cast<ConstantSDNode>(N->getOperand(1))) {
++        unsigned Element = Const->getZExtValue();
++        return DAG.getNode(ISD::BITCAST, N->getDebugLoc(), N->getVTList(),
++            Arg->getOperand(0).getOperand(Element));
++      }
++    }
++  }
++
++  case ISD::SELECT_CC: {
++    // fold selectcc (selectcc x, y, a, b, cc), b, a, b, seteq ->
++    //      selectcc x, y, a, b, inv(cc)
++    SDValue LHS = N->getOperand(0);
++    if (LHS.getOpcode() != ISD::SELECT_CC) {
++      return SDValue();
++    }
++
++    SDValue RHS = N->getOperand(1);
++    SDValue True = N->getOperand(2);
++    SDValue False = N->getOperand(3);
++
++    if (LHS.getOperand(2).getNode() != True.getNode() ||
++        LHS.getOperand(3).getNode() != False.getNode() ||
++        RHS.getNode() != False.getNode() ||
++        cast<CondCodeSDNode>(N->getOperand(4))->get() != ISD::SETEQ) {
++      return SDValue();
++    }
++
++    ISD::CondCode CCOpcode = cast<CondCodeSDNode>(LHS->getOperand(4))->get();
++    CCOpcode = ISD::getSetCCInverse(
++                        CCOpcode, LHS.getOperand(0).getValueType().isInteger());
++    return DAG.getSelectCC(N->getDebugLoc(),
++                           LHS.getOperand(0),
++                           LHS.getOperand(1),
++                           LHS.getOperand(2),
++                           LHS.getOperand(3),
++                           CCOpcode);
++
++  }
 +  }
 +  return SDValue();
 +}
 diff --git a/lib/Target/R600/R600ISelLowering.h b/lib/Target/R600/R600ISelLowering.h
 new file mode 100644
-index 0000000..2b954da
+index 0000000..afa3897
 --- /dev/null
 +++ b/lib/Target/R600/R600ISelLowering.h
-@@ -0,0 +1,72 @@
+@@ -0,0 +1,78 @@
 +//===-- R600ISelLowering.h - R600 DAG Lowering Interface -*- C++ -*--------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -14540,7 +15198,13 @@ index 0000000..2b954da
 +  SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
 +  SDValue LowerFPTOUINT(SDValue Op, SelectionDAG &DAG) const;
 +  SDValue LowerFPOW(SDValue Op, SelectionDAG &DAG) const;
-+  
++  SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
++  SDValue LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const;
++
++  SDValue stackPtrToRegIndex(SDValue Ptr, unsigned StackWidth,
++                                          SelectionDAG &DAG) const;
++  void getStackAddress(unsigned StackWidth, unsigned ElemIdx,
++                       unsigned &Channel, unsigned &PtrIncr) const;
 +  bool isZero(SDValue Op) const;
 +};
 +
@@ -14549,10 +15213,10 @@ index 0000000..2b954da
 +#endif // R600ISELLOWERING_H
 diff --git a/lib/Target/R600/R600InstrInfo.cpp b/lib/Target/R600/R600InstrInfo.cpp
 new file mode 100644
-index 0000000..70ed41aba
+index 0000000..31671ea
 --- /dev/null
 +++ b/lib/Target/R600/R600InstrInfo.cpp
-@@ -0,0 +1,665 @@
+@@ -0,0 +1,776 @@
 +//===-- R600InstrInfo.cpp - R600 Instruction Information ------------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -14571,8 +15235,12 @@ index 0000000..70ed41aba
 +#include "AMDGPUTargetMachine.h"
 +#include "AMDGPUSubtarget.h"
 +#include "R600Defines.h"
++#include "R600MachineFunctionInfo.h"
 +#include "R600RegisterInfo.h"
 +#include "llvm/CodeGen/MachineInstrBuilder.h"
++#include "llvm/CodeGen/MachineFrameInfo.h"
++#include "llvm/CodeGen/MachineRegisterInfo.h"
++#include "llvm/Instructions.h"
 +
 +#define GET_INSTRINFO_CTOR
 +#include "AMDGPUGenDFAPacketizer.inc"
@@ -14627,11 +15295,10 @@ index 0000000..70ed41aba
 +MachineInstr * R600InstrInfo::getMovImmInstr(MachineFunction *MF,
 +                                             unsigned DstReg, int64_t Imm) const {
 +  MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::MOV), DebugLoc());
-+  MachineInstrBuilder MIB(*MF, MI);
-+  MIB.addReg(DstReg, RegState::Define);
-+  MIB.addReg(AMDGPU::ALU_LITERAL_X);
-+  MIB.addImm(Imm);
-+  MIB.addReg(0); // PREDICATE_BIT
++  MachineInstrBuilder(MI).addReg(DstReg, RegState::Define);
++  MachineInstrBuilder(MI).addReg(AMDGPU::ALU_LITERAL_X);
++  MachineInstrBuilder(MI).addImm(Imm);
++  MachineInstrBuilder(MI).addReg(0); // PREDICATE_BIT
 +
 +  return MI;
 +}
@@ -14659,7 +15326,6 @@ index 0000000..70ed41aba
 +  switch (Opcode) {
 +  default: return false;
 +  case AMDGPU::RETURN:
-+  case AMDGPU::RESERVE_REG:
 +    return true;
 +  }
 +}
@@ -15005,8 +15671,7 @@ index 0000000..70ed41aba
 +  if (PIdx != -1) {
 +    MachineOperand &PMO = MI->getOperand(PIdx);
 +    PMO.setReg(Pred[2].getReg());
-+    MachineInstrBuilder MIB(*MI->getParent()->getParent(), MI);
-+    MIB.addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit);
++    MachineInstrBuilder(MI).addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit);
 +    return true;
 +  }
 +
@@ -15021,6 +15686,124 @@ index 0000000..70ed41aba
 +  return 2;
 +}
 +
++int R600InstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const {
++  const MachineRegisterInfo &MRI = MF.getRegInfo();
++  const MachineFrameInfo *MFI = MF.getFrameInfo();
++  int Offset = 0;
++
++  if (MFI->getNumObjects() == 0) {
++    return -1;
++  }
++
++  if (MRI.livein_empty()) {
++    return 0;
++  }
++
++  for (MachineRegisterInfo::livein_iterator LI = MRI.livein_begin(),
++                                            LE = MRI.livein_end();
++                                            LI != LE; ++LI) {
++    Offset = std::max(Offset,
++                      GET_REG_INDEX(RI.getEncodingValue(LI->first)));
++  }
++
++  return Offset + 1;
++}
++
++int R600InstrInfo::getIndirectIndexEnd(const MachineFunction &MF) const {
++  int Offset = 0;
++  const MachineFrameInfo *MFI = MF.getFrameInfo();
++
++  // Variable sized objects are not supported
++  assert(!MFI->hasVarSizedObjects());
++
++  if (MFI->getNumObjects() == 0) {
++    return -1;
++  }
++
++  Offset = TM.getFrameLowering()->getFrameIndexOffset(MF, -1);
++
++  return getIndirectIndexBegin(MF) + Offset;
++}
++
++std::vector<unsigned> R600InstrInfo::getIndirectReservedRegs(
++                                             const MachineFunction &MF) const {
++  const AMDGPUFrameLowering *TFL =
++                 static_cast<const AMDGPUFrameLowering*>(TM.getFrameLowering());
++  std::vector<unsigned> Regs;
++
++  unsigned StackWidth = TFL->getStackWidth(MF);
++  int End = getIndirectIndexEnd(MF);
++
++  if (End == -1) {
++    return Regs;
++  }
++
++  for (int Index = getIndirectIndexBegin(MF); Index <= End; ++Index) {
++    unsigned SuperReg = AMDGPU::R600_Reg128RegClass.getRegister(Index);
++    Regs.push_back(SuperReg);
++    for (unsigned Chan = 0; Chan < StackWidth; ++Chan) {
++      unsigned Reg = AMDGPU::R600_TReg32RegClass.getRegister((4 * Index) + Chan);
++      Regs.push_back(Reg);
++    }
++  }
++  return Regs;
++}
++
++unsigned R600InstrInfo::calculateIndirectAddress(unsigned RegIndex,
++                                                 unsigned Channel) const {
++  // XXX: Remove when we support a stack width > 2
++  assert(Channel == 0);
++  return RegIndex;
++}
++
++const TargetRegisterClass * R600InstrInfo::getIndirectAddrStoreRegClass(
++                                                     unsigned SourceReg) const {
++  return &AMDGPU::R600_TReg32RegClass;
++}
++
++const TargetRegisterClass *R600InstrInfo::getIndirectAddrLoadRegClass() const {
++  return &AMDGPU::TRegMemRegClass;
++}
++
++MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB,
++                                       MachineBasicBlock::iterator I,
++                                       unsigned ValueReg, unsigned Address,
++                                       unsigned OffsetReg) const {
++  unsigned AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address);
++  MachineInstr *MOVA = buildDefaultInstruction(*MBB, I, AMDGPU::MOVA_INT_eg,
++                                               AMDGPU::AR_X, OffsetReg);
++  setImmOperand(MOVA, R600Operands::WRITE, 0);
++
++  MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV,
++                                      AddrReg, ValueReg)
++                                      .addReg(AMDGPU::AR_X, RegState::Implicit);
++  setImmOperand(Mov, R600Operands::DST_REL, 1);
++  return Mov;
++}
++
++MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB,
++                                       MachineBasicBlock::iterator I,
++                                       unsigned ValueReg, unsigned Address,
++                                       unsigned OffsetReg) const {
++  unsigned AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address);
++  MachineInstr *MOVA = buildDefaultInstruction(*MBB, I, AMDGPU::MOVA_INT_eg,
++                                                       AMDGPU::AR_X,
++                                                       OffsetReg);
++  setImmOperand(MOVA, R600Operands::WRITE, 0);
++  MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV,
++                                      ValueReg,
++                                      AddrReg)
++                                      .addReg(AMDGPU::AR_X, RegState::Implicit);
++  setImmOperand(Mov, R600Operands::SRC0_REL, 1);
++
++  return Mov;
++}
++
++const TargetRegisterClass *R600InstrInfo::getSuperIndirectRegClass() const {
++  return &AMDGPU::IndirectRegRegClass;
++}
++
++
 +MachineInstrBuilder R600InstrInfo::buildDefaultInstruction(MachineBasicBlock &MBB,
 +                                                  MachineBasicBlock::iterator I,
 +                                                  unsigned Opcode,
@@ -15041,13 +15824,15 @@ index 0000000..70ed41aba
 +     .addReg(Src0Reg)  // $src0
 +     .addImm(0)        // $src0_neg
 +     .addImm(0)        // $src0_rel
-+     .addImm(0);       // $src0_abs
++     .addImm(0)        // $src0_abs
++     .addImm(-1);       // $src0_sel
 +
 +  if (Src1Reg) {
 +    MIB.addReg(Src1Reg) // $src1
 +       .addImm(0)       // $src1_neg
 +       .addImm(0)       // $src1_rel
-+       .addImm(0);       // $src1_abs
++       .addImm(0)       // $src1_abs
++       .addImm(-1);      // $src1_sel
 +  }
 +
 +  //XXX: The r600g finalizer expects this to be 1, once we've moved the
@@ -15076,16 +15861,6 @@ index 0000000..70ed41aba
 +
 +int R600InstrInfo::getOperandIdx(unsigned Opcode,
 +                                 R600Operands::Ops Op) const {
-+  const static int OpTable[3][R600Operands::COUNT] = {
-+//            W        C     S  S  S     S  S  S     S  S
-+//            R  O  D  L  S  R  R  R  S  R  R  R  S  R  R  L  P
-+//   D  U     I  M  R  A  R  C  C  C  C  C  C  C  R  C  C  A  R  I
-+//   S  E  U  T  O  E  M  C  0  0  0  C  1  1  1  C  2  2  S  E  M
-+//   T  M  P  E  D  L  P  0  N  R  A  1  N  R  A  2  N  R  T  D  M
-+    {0,-1,-1, 1, 2, 3, 4, 5, 6, 7, 8,-1,-1,-1,-1,-1,-1,-1, 9,10,11},
-+    {0, 1, 2, 3, 4 ,5 ,6 ,7, 8, 9,10,11,12,-1,-1,-1,13,14,15,16,17},
-+    {0,-1,-1,-1,-1, 1, 2, 3, 4, 5,-1, 6, 7, 8,-1, 9,10,11,12,13,14}
-+  };
 +  unsigned TargetFlags = get(Opcode).TSFlags;
 +  unsigned OpTableIdx;
 +
@@ -15111,7 +15886,7 @@ index 0000000..70ed41aba
 +    OpTableIdx = 2;
 +  }
 +
-+  return OpTable[OpTableIdx][Op];
++  return R600Operands::ALUOpTable[OpTableIdx][Op];
 +}
 +
 +void R600InstrInfo::setImmOperand(MachineInstr *MI, R600Operands::Ops Op,
@@ -15220,10 +15995,10 @@ index 0000000..70ed41aba
 +}
 diff --git a/lib/Target/R600/R600InstrInfo.h b/lib/Target/R600/R600InstrInfo.h
 new file mode 100644
-index 0000000..6bb0ca9
+index 0000000..278fad1
 --- /dev/null
 +++ b/lib/Target/R600/R600InstrInfo.h
-@@ -0,0 +1,169 @@
+@@ -0,0 +1,201 @@
 +//===-- R600InstrInfo.h - R600 Instruction Info Interface -------*- C++ -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -15340,6 +16115,38 @@ index 0000000..6bb0ca9
 +  virtual int getInstrLatency(const InstrItineraryData *ItinData,
 +                              SDNode *Node) const { return 1;}
 +
++  /// \returns a list of all the registers that may be accesed using indirect
++  /// addressing.
++  std::vector<unsigned> getIndirectReservedRegs(const MachineFunction &MF) const;
++
++  virtual int getIndirectIndexBegin(const MachineFunction &MF) const;
++
++  virtual int getIndirectIndexEnd(const MachineFunction &MF) const;
++
++
++  virtual unsigned calculateIndirectAddress(unsigned RegIndex,
++                                            unsigned Channel) const;
++
++  virtual const TargetRegisterClass *getIndirectAddrStoreRegClass(
++                                                      unsigned SourceReg) const;
++
++  virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const;
++
++  virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB,
++                                  MachineBasicBlock::iterator I,
++                                  unsigned ValueReg, unsigned Address,
++                                  unsigned OffsetReg) const;
++
++  virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB,
++                                  MachineBasicBlock::iterator I,
++                                  unsigned ValueReg, unsigned Address,
++                                  unsigned OffsetReg) const;
++
++  virtual const TargetRegisterClass *getSuperIndirectRegClass() const;
++
++
++  ///buildDefaultInstruction - This function returns a MachineInstr with
++  /// all the instruction modifiers initialized to their default values.
 +  /// You can use this function to avoid manually specifying each instruction
 +  /// modifier operand when building a new instruction.
 +  ///
@@ -15395,10 +16202,10 @@ index 0000000..6bb0ca9
 +#endif // R600INSTRINFO_H_
 diff --git a/lib/Target/R600/R600Instructions.td b/lib/Target/R600/R600Instructions.td
 new file mode 100644
-index 0000000..64bab18
+index 0000000..409da07
 --- /dev/null
 +++ b/lib/Target/R600/R600Instructions.td
-@@ -0,0 +1,1724 @@
+@@ -0,0 +1,1976 @@
 +//===-- R600Instructions.td - R600 Instruction defs  -------*- tablegen -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -15471,6 +16278,11 @@ index 0000000..64bab18
 +  let PrintMethod = PM;
 +}
 +
++// src_sel for ALU src operands, see also ALU_CONST, ALU_PARAM registers 
++def SEL : OperandWithDefaultOps <i32, (ops (i32 -1))> {
++  let PrintMethod = "printSel";
++}
++
 +def LITERAL : InstFlag<"printLiteral">;
 +
 +def WRITE : InstFlag <"printWrite", 1>;
@@ -15487,9 +16299,16 @@ index 0000000..64bab18
 +// default to 0.
 +def LAST : InstFlag<"printLast", 1>;
 +
++def FRAMEri : Operand<iPTR> {
++  let MIOperandInfo = (ops R600_Reg32:$ptr, i32imm:$index);
++}
++
 +def ADDRParam : ComplexPattern<i32, 2, "SelectADDRParam", [], []>;
 +def ADDRDWord : ComplexPattern<i32, 1, "SelectADDRDWord", [], []>;
 +def ADDRVTX_READ : ComplexPattern<i32, 2, "SelectADDRVTX_READ", [], []>;
++def ADDRGA_CONST_OFFSET : ComplexPattern<i32, 1, "SelectGlobalValueConstantOffset", [], []>;
++def ADDRGA_VAR_OFFSET : ComplexPattern<i32, 2, "SelectGlobalValueVariableOffset", [], []>;
++def ADDRIndirect : ComplexPattern<iPTR, 2, "SelectADDRIndirect", [], []>;
 +
 +class R600ALU_Word0 {
 +  field bits<32> Word0;
@@ -15574,6 +16393,55 @@ index 0000000..64bab18
 +  let Word1{17-13} = alu_inst;
 +}
 +
++class VTX_WORD0 {
++  field bits<32> Word0;
++  bits<7> SRC_GPR;
++  bits<5> VC_INST;
++  bits<2> FETCH_TYPE;
++  bits<1> FETCH_WHOLE_QUAD;
++  bits<8> BUFFER_ID;
++  bits<1> SRC_REL;
++  bits<2> SRC_SEL_X;
++  bits<6> MEGA_FETCH_COUNT;
++
++  let Word0{4-0}   = VC_INST;
++  let Word0{6-5}   = FETCH_TYPE;
++  let Word0{7}     = FETCH_WHOLE_QUAD;
++  let Word0{15-8}  = BUFFER_ID;
++  let Word0{22-16} = SRC_GPR;
++  let Word0{23}    = SRC_REL;
++  let Word0{25-24} = SRC_SEL_X;
++  let Word0{31-26} = MEGA_FETCH_COUNT;
++}
++
++class VTX_WORD1_GPR {
++  field bits<32> Word1;
++  bits<7> DST_GPR;
++  bits<1> DST_REL;
++  bits<3> DST_SEL_X;
++  bits<3> DST_SEL_Y;
++  bits<3> DST_SEL_Z;
++  bits<3> DST_SEL_W;
++  bits<1> USE_CONST_FIELDS;
++  bits<6> DATA_FORMAT;
++  bits<2> NUM_FORMAT_ALL;
++  bits<1> FORMAT_COMP_ALL;
++  bits<1> SRF_MODE_ALL;
++
++  let Word1{6-0} = DST_GPR;
++  let Word1{7}    = DST_REL;
++  let Word1{8}    = 0; // Reserved
++  let Word1{11-9} = DST_SEL_X;
++  let Word1{14-12} = DST_SEL_Y;
++  let Word1{17-15} = DST_SEL_Z;
++  let Word1{20-18} = DST_SEL_W;
++  let Word1{21}    = USE_CONST_FIELDS;
++  let Word1{27-22} = DATA_FORMAT;
++  let Word1{29-28} = NUM_FORMAT_ALL;
++  let Word1{30}    = FORMAT_COMP_ALL;
++  let Word1{31}    = SRF_MODE_ALL;
++}
++
 +/*
 +XXX: R600 subtarget uses a slightly different encoding than the other
 +subtargets.  We currently handle this in R600MCCodeEmitter, but we may
@@ -15615,11 +16483,11 @@ index 0000000..64bab18
 +    InstR600 <0,
 +              (outs R600_Reg32:$dst),
 +              (ins WRITE:$write, OMOD:$omod, REL:$dst_rel, CLAMP:$clamp,
-+                   R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs,
++                   R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs, SEL:$src0_sel,
 +                   LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal),
 +              !strconcat(opName,
 +                   "$clamp $dst$write$dst_rel$omod, "
-+                   "$src0_neg$src0_abs$src0$src0_abs$src0_rel, "
++                   "$src0_neg$src0_abs$src0$src0_sel$src0_abs$src0_rel, "
 +                   "$literal $pred_sel$last"),
 +              pattern,
 +              itin>,
@@ -15655,13 +16523,13 @@ index 0000000..64bab18
 +          (outs R600_Reg32:$dst),
 +          (ins UEM:$update_exec_mask, UP:$update_pred, WRITE:$write,
 +               OMOD:$omod, REL:$dst_rel, CLAMP:$clamp,
-+               R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs,
-+               R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, ABS:$src1_abs,
++               R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, ABS:$src0_abs, SEL:$src0_sel,
++               R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, ABS:$src1_abs, SEL:$src1_sel,
 +               LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal),
 +          !strconcat(opName,
 +                "$clamp $update_exec_mask$update_pred$dst$write$dst_rel$omod, "
-+                "$src0_neg$src0_abs$src0$src0_abs$src0_rel, "
-+                "$src1_neg$src1_abs$src1$src1_abs$src1_rel, "
++                "$src0_neg$src0_abs$src0$src0_sel$src0_abs$src0_rel, "
++                "$src1_neg$src1_abs$src1$src1_sel$src1_abs$src1_rel, "
 +                "$literal $pred_sel$last"),
 +          pattern,
 +          itin>,
@@ -15692,14 +16560,14 @@ index 0000000..64bab18
 +  InstR600 <0,
 +          (outs R600_Reg32:$dst),
 +          (ins REL:$dst_rel, CLAMP:$clamp,
-+               R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel,
-+               R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel,
-+               R600_Reg32:$src2, NEG:$src2_neg, REL:$src2_rel,
++               R600_Reg32:$src0, NEG:$src0_neg, REL:$src0_rel, SEL:$src0_sel,
++               R600_Reg32:$src1, NEG:$src1_neg, REL:$src1_rel, SEL:$src1_sel,
++               R600_Reg32:$src2, NEG:$src2_neg, REL:$src2_rel, SEL:$src2_sel,
 +               LAST:$last, R600_Pred:$pred_sel, LITERAL:$literal),
 +          !strconcat(opName, "$clamp $dst$dst_rel, "
-+                             "$src0_neg$src0$src0_rel, "
-+                             "$src1_neg$src1$src1_rel, "
-+                             "$src2_neg$src2$src2_rel, "
++                             "$src0_neg$src0$src0_sel$src0_rel, "
++                             "$src1_neg$src1$src1_sel$src1_rel, "
++                             "$src2_neg$src2$src2_sel$src2_rel, "
 +                             "$literal $pred_sel$last"),
 +          pattern,
 +          itin>,
@@ -15743,6 +16611,27 @@ index 0000000..64bab18
 +  }]
 +>;
 +
++def TEX_RECT : PatLeaf<
++  (imm),
++  [{uint32_t TType = (uint32_t)N->getZExtValue();
++    return TType == 5;
++  }]
++>;
++
++def TEX_ARRAY : PatLeaf<
++  (imm),
++  [{uint32_t TType = (uint32_t)N->getZExtValue();
++    return TType == 9 || TType == 10 || TType == 15 || TType == 16;
++  }]
++>;
++
++def TEX_SHADOW_ARRAY : PatLeaf<
++  (imm),
++  [{uint32_t TType = (uint32_t)N->getZExtValue();
++    return TType == 11 || TType == 12 || TType == 17;
++  }]
++>;
++
 +class EG_CF_RAT <bits <8> cf_inst, bits <6> rat_inst, bits<4> rat_id, dag outs,
 +                 dag ins, string asm, list<dag> pattern> :
 +    InstR600ISA <outs, ins, asm, pattern> {
@@ -15815,32 +16704,35 @@ index 0000000..64bab18
 +                     "Subtarget.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX">;
 +
 +//===----------------------------------------------------------------------===//
-+// Interpolation Instructions
++// R600 SDNodes
 +//===----------------------------------------------------------------------===//
 +
-+def INTERP: SDNode<"AMDGPUISD::INTERP",
-+  SDTypeProfile<1, 2, [SDTCisFP<0>, SDTCisInt<1>, SDTCisInt<2>]>
-+  >;
-+
-+def INTERP_P0: SDNode<"AMDGPUISD::INTERP_P0",
-+  SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisInt<1>]>
-+  >;
++def INTERP_PAIR_XY :  AMDGPUShaderInst <
++  (outs R600_TReg32_X:$dst0, R600_TReg32_Y:$dst1),
++  (ins i32imm:$src0, R600_Reg32:$src1, R600_Reg32:$src2),
++  "INTERP_PAIR_XY $src0 $src1 $src2 : $dst0 dst1",
++  []>;
++
++def INTERP_PAIR_ZW :  AMDGPUShaderInst <
++  (outs R600_TReg32_Z:$dst0, R600_TReg32_W:$dst1),
++  (ins i32imm:$src0, R600_Reg32:$src1, R600_Reg32:$src2),
++  "INTERP_PAIR_ZW $src0 $src1 $src2 : $dst0 dst1",
++  []>;
++
++def CONST_ADDRESS: SDNode<"AMDGPUISD::CONST_ADDRESS",
++  SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisPtrTy<1>]>,
++  [SDNPMayLoad]
++>;
 +
-+let usesCustomInserter = 1 in {
-+def input_perspective :  AMDGPUShaderInst <
-+  (outs R600_Reg128:$dst),
-+  (ins i32imm:$src0, i32imm:$src1),
-+  "input_perspective $src0 $src1 : dst",
-+  [(set R600_Reg128:$dst, (INTERP (i32 imm:$src0), (i32 imm:$src1)))]>;
-+}  // End usesCustomInserter = 1
++//===----------------------------------------------------------------------===//
++// Interpolation Instructions
++//===----------------------------------------------------------------------===//
 +
-+def input_constant :  AMDGPUShaderInst <
++def INTERP_VEC_LOAD :  AMDGPUShaderInst <
 +  (outs R600_Reg128:$dst),
-+  (ins i32imm:$src),
-+  "input_perspective $src : dst",
-+  [(set R600_Reg128:$dst, (INTERP_P0 (i32 imm:$src)))]>;
-+
-+
++  (ins i32imm:$src0),
++  "INTERP_LOAD $src0 : $dst",
++  []>;
 +
 +def INTERP_XY : R600_2OP <0xD6, "INTERP_XY", []> {
 +  let bank_swizzle = 5;
@@ -15908,19 +16800,24 @@ index 0000000..64bab18
 +multiclass ExportPattern<Instruction ExportInst, bits<8> cf_inst> {
 +  def : Pat<(int_R600_store_pixel_depth R600_Reg32:$reg),
 +    (ExportInst
-+        (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sel_x),
++        (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sub0),
 +        0, 61, 0, 7, 7, 7, cf_inst, 0)
 +  >;
 +
 +  def : Pat<(int_R600_store_pixel_stencil R600_Reg32:$reg),
 +    (ExportInst
-+        (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sel_x),
++        (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)), R600_Reg32:$reg, sub0),
 +        0, 61, 7, 0, 7, 7, cf_inst, 0)
 +  >;
 +
-+  def : Pat<(int_R600_store_pixel_dummy),
++  def : Pat<(int_R600_store_dummy (i32 imm:$type)),
++    (ExportInst
++        (v4f32 (IMPLICIT_DEF)), imm:$type, 0, 7, 7, 7, 7, cf_inst, 0)
++  >;
++
++  def : Pat<(int_R600_store_dummy 1),
 +    (ExportInst
-+        (v4f32 (IMPLICIT_DEF)), 0, 0, 7, 7, 7, 7, cf_inst, 0)
++        (v4f32 (IMPLICIT_DEF)), 1, 60, 7, 7, 7, 7, cf_inst, 0)
 +  >;
 +
 +  def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 0),
@@ -15928,29 +16825,40 @@ index 0000000..64bab18
 +        (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
 +        0, 1, 2, 3, cf_inst, 0)
 +  >;
++  def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 1),
++    (i32 imm:$type), (i32 imm:$arraybase), (i32 imm)),
++        (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++        0, 1, 2, 3, cf_inst, 0)
++  >;
++
++  def : Pat<(int_R600_store_swizzle (v4f32 R600_Reg128:$src), imm:$arraybase,
++      imm:$type),
++    (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++        0, 1, 2, 3, cf_inst, 0)
++  >;
 +}
 +
 +multiclass SteamOutputExportPattern<Instruction ExportInst,
 +    bits<8> buf0inst, bits<8> buf1inst, bits<8> buf2inst, bits<8> buf3inst> {
 +// Stream0
-+  def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 1),
-+      (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
-+      (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++  def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
++      (i32 imm:$arraybase), (i32 0), (i32 imm:$mask)),
++      (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
 +      4095, imm:$mask, buf0inst, 0)>;
 +// Stream1
-+  def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 2),
-+      (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
-+      (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++  def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
++      (i32 imm:$arraybase), (i32 1), (i32 imm:$mask)),
++      (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
 +      4095, imm:$mask, buf1inst, 0)>;
 +// Stream2
-+  def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 3),
-+      (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
-+      (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++  def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
++      (i32 imm:$arraybase), (i32 2), (i32 imm:$mask)),
++      (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
 +      4095, imm:$mask, buf2inst, 0)>;
 +// Stream3
-+  def : Pat<(EXPORT (v4f32 R600_Reg128:$src), (i32 4),
-+      (i32 imm:$type), (i32 imm:$arraybase), (i32 imm:$mask)),
-+      (ExportInst R600_Reg128:$src, imm:$type, imm:$arraybase,
++  def : Pat<(int_R600_store_stream_output (v4f32 R600_Reg128:$src),
++      (i32 imm:$arraybase), (i32 3), (i32 imm:$mask)),
++      (ExportInst R600_Reg128:$src, 0, imm:$arraybase,
 +      4095, imm:$mask, buf3inst, 0)>;
 +}
 +
@@ -16025,6 +16933,34 @@ index 0000000..64bab18
 +    COND_NE))]
 +>;
 +
++def SETE_DX10 : R600_2OP <
++  0xC, "SETE_DX10",
++  [(set R600_Reg32:$dst,
++   (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
++    COND_EQ))]
++>;
++
++def SETGT_DX10 : R600_2OP <
++  0xD, "SETGT_DX10",
++  [(set R600_Reg32:$dst,
++   (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
++    COND_GT))]
++>;
++
++def SETGE_DX10 : R600_2OP <
++  0xE, "SETGE_DX10",
++  [(set R600_Reg32:$dst,
++   (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
++    COND_GE))]
++>;
++
++def SETNE_DX10 : R600_2OP <
++  0xF, "SETNE_DX10",
++  [(set R600_Reg32:$dst,
++    (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, (i32 -1), (i32 0),
++     COND_NE))]
++>;
++
 +def FRACT : R600_1OP_Helper <0x10, "FRACT", AMDGPUfract>;
 +def TRUNC : R600_1OP_Helper <0x11, "TRUNC", int_AMDGPU_trunc>;
 +def CEIL : R600_1OP_Helper <0x12, "CEIL", fceil>;
@@ -16085,7 +17021,7 @@ index 0000000..64bab18
 +>;
 +
 +def SETGT_INT : R600_2OP <
-+  0x3B, "SGT_INT",
++  0x3B, "SETGT_INT",
 +  [(set (i32 R600_Reg32:$dst),
 +   (selectcc (i32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETGT))]
 +>;
@@ -16539,6 +17475,10 @@ index 0000000..64bab18
 +  defm DOT4_eg : DOT4_Common<0xBE>;
 +  defm CUBE_eg : CUBE_Common<0xC0>;
 +
++let hasSideEffects = 1 in {
++  def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", []>;
++}
++
 +  def TGSI_LIT_Z_eg : TGSI_LIT_Z_Common<MUL_LIT_eg, LOG_CLAMPED_eg, EXP_IEEE_eg>;
 +
 +  def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {
@@ -16629,37 +17569,30 @@ index 0000000..64bab18
 +>;
 +
 +class VTX_READ_eg <string name, bits<8> buffer_id, dag outs, list<dag> pattern>
-+    : InstR600ISA <outs, (ins MEMxi:$ptr), name#" $dst, $ptr", pattern> {
-+
-+  // Operands
-+  bits<7> DST_GPR;
-+  bits<7> SRC_GPR;
++    : InstR600ISA <outs, (ins MEMxi:$ptr), name#" $dst, $ptr", pattern>,
++      VTX_WORD1_GPR, VTX_WORD0 {
 +
 +  // Static fields
-+  bits<5> VC_INST = 0;
-+  bits<2> FETCH_TYPE = 2;
-+  bits<1> FETCH_WHOLE_QUAD = 0;
-+  bits<8> BUFFER_ID = buffer_id;
-+  bits<1> SRC_REL = 0;
++  let VC_INST = 0;
++  let FETCH_TYPE = 2;
++  let FETCH_WHOLE_QUAD = 0;
++  let BUFFER_ID = buffer_id;
++  let SRC_REL = 0;
 +  // XXX: We can infer this field based on the SRC_GPR.  This would allow us
 +  // to store vertex addresses in any channel, not just X.
-+  bits<2> SRC_SEL_X = 0;
-+  bits<6> MEGA_FETCH_COUNT;
-+  bits<1> DST_REL = 0;
-+  bits<3> DST_SEL_X;
-+  bits<3> DST_SEL_Y;
-+  bits<3> DST_SEL_Z;
-+  bits<3> DST_SEL_W;
++  let SRC_SEL_X = 0;
++  let DST_REL = 0;
 +  // The docs say that if this bit is set, then DATA_FORMAT, NUM_FORMAT_ALL,
 +  // FORMAT_COMP_ALL, SRF_MODE_ALL, and ENDIAN_SWAP fields will be ignored,
 +  // however, based on my testing if USE_CONST_FIELDS is set, then all
 +  // these fields need to be set to 0.
-+  bits<1> USE_CONST_FIELDS = 0;
-+  bits<6> DATA_FORMAT;
-+  bits<2> NUM_FORMAT_ALL = 1;
-+  bits<1> FORMAT_COMP_ALL = 0;
-+  bits<1> SRF_MODE_ALL = 0;
++  let USE_CONST_FIELDS = 0;
++  let NUM_FORMAT_ALL = 1;
++  let FORMAT_COMP_ALL = 0;
++  let SRF_MODE_ALL = 0;
 +
++  let Inst{31-0} = Word0;
++  let Inst{63-32} = Word1;
 +  // LLVM can only encode 64-bit instructions, so these fields are manually
 +  // encoded in R600CodeEmitter
 +  //
@@ -16670,29 +17603,7 @@ index 0000000..64bab18
 +  // bits<1>  ALT_CONST = 0;
 +  // bits<2>  BUFFER_INDEX_MODE = 0;
 +
-+  // VTX_WORD0
-+  let Inst{4-0}   = VC_INST;
-+  let Inst{6-5}   = FETCH_TYPE;
-+  let Inst{7}     = FETCH_WHOLE_QUAD;
-+  let Inst{15-8}  = BUFFER_ID;
-+  let Inst{22-16} = SRC_GPR;
-+  let Inst{23}    = SRC_REL;
-+  let Inst{25-24} = SRC_SEL_X;
-+  let Inst{31-26} = MEGA_FETCH_COUNT;
-+
-+  // VTX_WORD1_GPR
-+  let Inst{38-32} = DST_GPR;
-+  let Inst{39}    = DST_REL;
-+  let Inst{40}    = 0; // Reserved
-+  let Inst{43-41} = DST_SEL_X;
-+  let Inst{46-44} = DST_SEL_Y;
-+  let Inst{49-47} = DST_SEL_Z;
-+  let Inst{52-50} = DST_SEL_W;
-+  let Inst{53}    = USE_CONST_FIELDS;
-+  let Inst{59-54} = DATA_FORMAT;
-+  let Inst{61-60} = NUM_FORMAT_ALL;
-+  let Inst{62}    = FORMAT_COMP_ALL;
-+  let Inst{63}    = SRF_MODE_ALL;
++
 +
 +  // VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding
 +  // is done in R600CodeEmitter
@@ -16788,6 +17699,10 @@ index 0000000..64bab18
 +  [(set (i32 R600_TReg32_X:$dst), (load_param ADDRVTX_READ:$ptr))]
 +>;
 +
++def VTX_READ_PARAM_128_eg : VTX_READ_128_eg <0,
++  [(set (v4i32 R600_Reg128:$dst), (load_param ADDRVTX_READ:$ptr))]
++>;
++
 +//===----------------------------------------------------------------------===//
 +// VTX Read from global memory space
 +//===----------------------------------------------------------------------===//
@@ -16818,6 +17733,12 @@ index 0000000..64bab18
 +
 +}
 +
++//===----------------------------------------------------------------------===//
++// Regist loads and stores - for indirect addressing
++//===----------------------------------------------------------------------===//
++
++defm R600_ : RegisterLoadStore <R600_Reg32, FRAMEri, ADDRIndirect>;
++
 +let Predicates = [isCayman] in {
 +
 +let isVector = 1 in { 
@@ -16877,6 +17798,7 @@ index 0000000..64bab18
 +  (ins R600_Reg32:$src0, i32imm:$src1, i32imm:$flags),
 +  "", [], NullALU> {
 +  let FlagOperandIdx = 3;
++  let isTerminator = 1;
 +}
 +
 +let isTerminator = 1, isBranch = 1, isBarrier = 1 in {
@@ -16903,19 +17825,6 @@ index 0000000..64bab18
 +
 +} // End mayLoad = 0, mayStore = 0, hasSideEffects = 1
 +
-+def R600_LOAD_CONST : AMDGPUShaderInst <
-+  (outs R600_Reg32:$dst),
-+  (ins i32imm:$src0),
-+  "R600_LOAD_CONST $dst, $src0",
-+  [(set R600_Reg32:$dst, (int_AMDGPU_load_const imm:$src0))]
-+>;
-+
-+def RESERVE_REG : AMDGPUShaderInst <
-+  (outs),
-+  (ins i32imm:$src),
-+  "RESERVE_REG $src",
-+  [(int_AMDGPU_reserve_reg imm:$src)]
-+>;
 +
 +def TXD: AMDGPUShaderInst <
 +  (outs R600_Reg128:$dst),
@@ -16946,22 +17855,148 @@ index 0000000..64bab18
 +      "RETURN", [(IL_retflag)]>;
 +}
 +
-+//===--------------------------------------------------------------------===//
-+// Instructions support
-+//===--------------------------------------------------------------------===//
-+//===---------------------------------------------------------------------===//
-+// Custom Inserter for Branches and returns, this eventually will be a
-+// seperate pass
-+//===---------------------------------------------------------------------===//
-+let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1 in {
-+  def BRANCH : ILFormat<(outs), (ins brtarget:$target),
-+      "; Pseudo unconditional branch instruction",
-+      [(br bb:$target)]>;
-+  defm BRANCH_COND : BranchConditional<IL_brcond>;
-+}
 +
-+//===---------------------------------------------------------------------===//
-+// Flow and Program control Instructions
++//===----------------------------------------------------------------------===//
++// Constant Buffer Addressing Support
++//===----------------------------------------------------------------------===//
++
++let isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU"  in {
++def CONST_COPY : Instruction {
++  let OutOperandList = (outs R600_Reg32:$dst);
++  let InOperandList = (ins i32imm:$src);
++  let Pattern = [(set R600_Reg32:$dst, (CONST_ADDRESS ADDRGA_CONST_OFFSET:$src))];
++  let AsmString = "CONST_COPY";
++  let neverHasSideEffects = 1;
++  let isAsCheapAsAMove = 1;
++  let Itinerary = NullALU;
++}
++} // end isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU"
++
++def TEX_VTX_CONSTBUF :
++  InstR600ISA <(outs R600_Reg128:$dst), (ins MEMxi:$ptr), "VTX_READ_eg $dst, $ptr",
++      [(set R600_Reg128:$dst, (CONST_ADDRESS ADDRGA_VAR_OFFSET:$ptr))]>,
++  VTX_WORD1_GPR, VTX_WORD0 {
++
++  let VC_INST = 0;
++  let FETCH_TYPE = 2;
++  let FETCH_WHOLE_QUAD = 0;
++  let BUFFER_ID = 0;
++  let SRC_REL = 0;
++  let SRC_SEL_X = 0;
++  let DST_REL = 0;
++  let USE_CONST_FIELDS = 0;
++  let NUM_FORMAT_ALL = 2;
++  let FORMAT_COMP_ALL = 1;
++  let SRF_MODE_ALL = 1;
++  let MEGA_FETCH_COUNT = 16;
++  let DST_SEL_X        = 0;
++  let DST_SEL_Y        = 1;
++  let DST_SEL_Z        = 2;
++  let DST_SEL_W        = 3;
++  let DATA_FORMAT      = 35;
++
++  let Inst{31-0} = Word0;
++  let Inst{63-32} = Word1;
++
++// LLVM can only encode 64-bit instructions, so these fields are manually
++// encoded in R600CodeEmitter
++//
++// bits<16> OFFSET;
++// bits<2>  ENDIAN_SWAP = 0;
++// bits<1>  CONST_BUF_NO_STRIDE = 0;
++// bits<1>  MEGA_FETCH = 0;
++// bits<1>  ALT_CONST = 0;
++// bits<2>  BUFFER_INDEX_MODE = 0;
++
++
++
++// VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding
++// is done in R600CodeEmitter
++//
++// Inst{79-64} = OFFSET;
++// Inst{81-80} = ENDIAN_SWAP;
++// Inst{82}    = CONST_BUF_NO_STRIDE;
++// Inst{83}    = MEGA_FETCH;
++// Inst{84}    = ALT_CONST;
++// Inst{86-85} = BUFFER_INDEX_MODE;
++// Inst{95-86} = 0; Reserved
++
++// VTX_WORD3 (Padding)
++//
++// Inst{127-96} = 0;
++}
++
++def TEX_VTX_TEXBUF:
++  InstR600ISA <(outs R600_Reg128:$dst), (ins MEMxi:$ptr, i32imm:$BUFFER_ID), "TEX_VTX_EXPLICIT_READ $dst, $ptr",
++      [(set R600_Reg128:$dst, (int_R600_load_texbuf ADDRGA_VAR_OFFSET:$ptr, imm:$BUFFER_ID))]>,
++VTX_WORD1_GPR, VTX_WORD0 {
++
++let VC_INST = 0;
++let FETCH_TYPE = 2;
++let FETCH_WHOLE_QUAD = 0;
++let SRC_REL = 0;
++let SRC_SEL_X = 0;
++let DST_REL = 0;
++let USE_CONST_FIELDS = 1;
++let NUM_FORMAT_ALL = 0;
++let FORMAT_COMP_ALL = 0;
++let SRF_MODE_ALL = 1;
++let MEGA_FETCH_COUNT = 16;
++let DST_SEL_X        = 0;
++let DST_SEL_Y        = 1;
++let DST_SEL_Z        = 2;
++let DST_SEL_W        = 3;
++let DATA_FORMAT      = 0;
++
++let Inst{31-0} = Word0;
++let Inst{63-32} = Word1;
++
++// LLVM can only encode 64-bit instructions, so these fields are manually
++// encoded in R600CodeEmitter
++//
++// bits<16> OFFSET;
++// bits<2>  ENDIAN_SWAP = 0;
++// bits<1>  CONST_BUF_NO_STRIDE = 0;
++// bits<1>  MEGA_FETCH = 0;
++// bits<1>  ALT_CONST = 0;
++// bits<2>  BUFFER_INDEX_MODE = 0;
++
++
++
++// VTX_WORD2 (LLVM can only encode 64-bit instructions, so WORD2 encoding
++// is done in R600CodeEmitter
++//
++// Inst{79-64} = OFFSET;
++// Inst{81-80} = ENDIAN_SWAP;
++// Inst{82}    = CONST_BUF_NO_STRIDE;
++// Inst{83}    = MEGA_FETCH;
++// Inst{84}    = ALT_CONST;
++// Inst{86-85} = BUFFER_INDEX_MODE;
++// Inst{95-86} = 0; Reserved
++
++// VTX_WORD3 (Padding)
++//
++// Inst{127-96} = 0;
++}
++
++
++
++//===--------------------------------------------------------------------===//
++// Instructions support
++//===--------------------------------------------------------------------===//
++//===---------------------------------------------------------------------===//
++// Custom Inserter for Branches and returns, this eventually will be a
++// seperate pass
++//===---------------------------------------------------------------------===//
++let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1 in {
++  def BRANCH : ILFormat<(outs), (ins brtarget:$target),
++      "; Pseudo unconditional branch instruction",
++      [(br bb:$target)]>;
++  defm BRANCH_COND : BranchConditional<IL_brcond>;
++}
++
++//===---------------------------------------------------------------------===//
++// Flow and Program control Instructions
 +//===---------------------------------------------------------------------===//
 +let isTerminator=1 in {
 +  def SWITCH      : ILFormat< (outs), (ins GPRI32:$src),
@@ -17045,6 +18080,18 @@ index 0000000..64bab18
 +  (SGE R600_Reg32:$src1, R600_Reg32:$src0) 
 +>;
 +
++// SETGT_DX10 reverse args
++def : Pat <
++  (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, COND_LT),
++  (SETGT_DX10 R600_Reg32:$src1, R600_Reg32:$src0)
++>;
++
++// SETGE_DX10 reverse args
++def : Pat <
++  (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, COND_LE),
++  (SETGE_DX10 R600_Reg32:$src1, R600_Reg32:$src0)
++>;
++
 +// SETGT_INT reverse args
 +def : Pat <
 +  (selectcc (i32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETLT),
@@ -17083,31 +18130,43 @@ index 0000000..64bab18
 +  (SETE R600_Reg32:$src0, R600_Reg32:$src1)
 +>;
 +
++//SETE_DX10 - 'true if ordered'
++def : Pat <
++  (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETO),
++  (SETE_DX10 R600_Reg32:$src0, R600_Reg32:$src1)
++>;
++
 +//SNE - 'true if unordered'
 +def : Pat <
 +  (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, FP_ONE, FP_ZERO, SETUO),
 +  (SNE R600_Reg32:$src0, R600_Reg32:$src1)
 +>;
 +
-+def : Extract_Element <f32, v4f32, R600_Reg128, 0, sel_x>;
-+def : Extract_Element <f32, v4f32, R600_Reg128, 1, sel_y>;
-+def : Extract_Element <f32, v4f32, R600_Reg128, 2, sel_z>;
-+def : Extract_Element <f32, v4f32, R600_Reg128, 3, sel_w>;
++//SETNE_DX10 - 'true if ordered'
++def : Pat <
++  (selectcc (f32 R600_Reg32:$src0), R600_Reg32:$src1, -1, 0, SETUO),
++  (SETNE_DX10 R600_Reg32:$src0, R600_Reg32:$src1)
++>;
 +
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 0, sel_x>;
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 1, sel_y>;
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 2, sel_z>;
-+def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 3, sel_w>;
++def : Extract_Element <f32, v4f32, R600_Reg128, 0, sub0>;
++def : Extract_Element <f32, v4f32, R600_Reg128, 1, sub1>;
++def : Extract_Element <f32, v4f32, R600_Reg128, 2, sub2>;
++def : Extract_Element <f32, v4f32, R600_Reg128, 3, sub3>;
 +
-+def : Extract_Element <i32, v4i32, R600_Reg128, 0, sel_x>;
-+def : Extract_Element <i32, v4i32, R600_Reg128, 1, sel_y>;
-+def : Extract_Element <i32, v4i32, R600_Reg128, 2, sel_z>;
-+def : Extract_Element <i32, v4i32, R600_Reg128, 3, sel_w>;
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 0, sub0>;
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 1, sub1>;
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 2, sub2>;
++def : Insert_Element <f32, v4f32, R600_Reg32, R600_Reg128, 3, sub3>;
 +
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 0, sel_x>;
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 1, sel_y>;
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 2, sel_z>;
-+def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 3, sel_w>;
++def : Extract_Element <i32, v4i32, R600_Reg128, 0, sub0>;
++def : Extract_Element <i32, v4i32, R600_Reg128, 1, sub1>;
++def : Extract_Element <i32, v4i32, R600_Reg128, 2, sub2>;
++def : Extract_Element <i32, v4i32, R600_Reg128, 3, sub3>;
++
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 0, sub0>;
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 1, sub1>;
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 2, sub2>;
++def : Insert_Element <i32, v4i32, R600_Reg32, R600_Reg128, 3, sub3>;
 +
 +def : Vector_Build <v4f32, R600_Reg128, f32, R600_Reg32>;
 +def : Vector_Build <v4i32, R600_Reg128, i32, R600_Reg32>;
@@ -17125,10 +18184,10 @@ index 0000000..64bab18
 +} // End isR600toCayman Predicate
 diff --git a/lib/Target/R600/R600Intrinsics.td b/lib/Target/R600/R600Intrinsics.td
 new file mode 100644
-index 0000000..3825bc4
+index 0000000..6046f0d
 --- /dev/null
 +++ b/lib/Target/R600/R600Intrinsics.td
-@@ -0,0 +1,32 @@
+@@ -0,0 +1,57 @@
 +//===-- R600Intrinsics.td - R600 Instrinsic defs -------*- tablegen -*-----===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -17143,30 +18202,283 @@ index 0000000..3825bc4
 +//===----------------------------------------------------------------------===//
 +
 +let TargetPrefix = "R600", isTarget = 1 in {
-+  def int_R600_load_input : Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem]>;
-+  def int_R600_load_input_perspective :
-+    Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>;
-+  def int_R600_load_input_constant :
-+    Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>;
-+  def int_R600_load_input_linear :
-+    Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrReadMem]>;
++  def int_R600_load_input :
++    Intrinsic<[llvm_float_ty], [llvm_i32_ty], [IntrNoMem]>;
++  def int_R600_interp_input :
++    Intrinsic<[llvm_float_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem]>;
++  def int_R600_load_texbuf :
++    Intrinsic<[llvm_v4f32_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem]>;
++  def int_R600_store_swizzle :
++    Intrinsic<[], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty], []>;
++
 +  def int_R600_store_stream_output :
-+    Intrinsic<[], [llvm_float_ty, llvm_i32_ty, llvm_i32_ty], []>;
++    Intrinsic<[], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], []>;
 +  def int_R600_store_pixel_color :
 +      Intrinsic<[], [llvm_float_ty, llvm_i32_ty], []>;
 +  def int_R600_store_pixel_depth :
 +      Intrinsic<[], [llvm_float_ty], []>;
 +  def int_R600_store_pixel_stencil :
 +      Intrinsic<[], [llvm_float_ty], []>;
-+  def int_R600_store_pixel_dummy :
-+      Intrinsic<[], [], []>;
++  def int_R600_store_dummy :
++      Intrinsic<[], [llvm_i32_ty], []>;
++}
++let TargetPrefix = "r600", isTarget = 1 in {
++
++class R600ReadPreloadRegisterIntrinsic<string name>
++  : Intrinsic<[llvm_i32_ty], [], [IntrNoMem]>,
++    GCCBuiltin<name>;
++
++multiclass R600ReadPreloadRegisterIntrinsic_xyz<string prefix> {
++  def _x : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_x")>;
++  def _y : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_y")>;
++  def _z : R600ReadPreloadRegisterIntrinsic<!strconcat(prefix, "_z")>;
++}
++
++defm int_r600_read_global_size : R600ReadPreloadRegisterIntrinsic_xyz <
++                                       "__builtin_r600_read_global_size">;
++defm int_r600_read_local_size : R600ReadPreloadRegisterIntrinsic_xyz <
++                                       "__builtin_r600_read_local_size">;
++defm int_r600_read_ngroups : R600ReadPreloadRegisterIntrinsic_xyz <
++                                       "__builtin_r600_read_ngroups">;
++defm int_r600_read_tgid : R600ReadPreloadRegisterIntrinsic_xyz <
++                                       "__builtin_r600_read_tgid">;
++defm int_r600_read_tidig : R600ReadPreloadRegisterIntrinsic_xyz <
++                                       "__builtin_r600_read_tidig">;
++}
+diff --git a/lib/Target/R600/R600LowerConstCopy.cpp b/lib/Target/R600/R600LowerConstCopy.cpp
+new file mode 100644
+index 0000000..c8c27a8
+--- /dev/null
++++ b/lib/Target/R600/R600LowerConstCopy.cpp
+@@ -0,0 +1,222 @@
++//===-- R600LowerConstCopy.cpp - Propagate ConstCopy / lower them to MOV---===//
++//
++//                     The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++//
++/// \file
++/// This pass is intended to handle remaining ConstCopy pseudo MachineInstr.
++/// ISel will fold each Const Buffer read inside scalar ALU. However it cannot
++/// fold them inside vector instruction, like DOT4 or Cube ; ISel emits
++/// ConstCopy instead. This pass (executed after ExpandingSpecialInstr) will try
++/// to fold them if possible or replace them by MOV otherwise.
++//
++//===----------------------------------------------------------------------===//
++
++#include "AMDGPU.h"
++#include "llvm/CodeGen/MachineFunction.h"
++#include "llvm/CodeGen/MachineFunctionPass.h"
++#include "R600InstrInfo.h"
++#include "llvm/GlobalValue.h"
++#include "llvm/CodeGen/MachineInstrBuilder.h"
++
++namespace llvm {
++
++class R600LowerConstCopy : public MachineFunctionPass {
++private:
++  static char ID;
++  const R600InstrInfo *TII;
++
++  struct ConstPairs {
++    unsigned XYPair;
++    unsigned ZWPair;
++  };
++
++  bool canFoldInBundle(ConstPairs &UsedConst, unsigned ReadConst) const;
++public:
++  R600LowerConstCopy(TargetMachine &tm);
++  virtual bool runOnMachineFunction(MachineFunction &MF);
++
++  const char *getPassName() const { return "R600 Eliminate Symbolic Operand"; }
++};
++
++char R600LowerConstCopy::ID = 0;
++
++R600LowerConstCopy::R600LowerConstCopy(TargetMachine &tm) :
++    MachineFunctionPass(ID),
++    TII (static_cast<const R600InstrInfo *>(tm.getInstrInfo()))
++{
++}
++
++bool R600LowerConstCopy::canFoldInBundle(ConstPairs &UsedConst,
++    unsigned ReadConst) const {
++  unsigned ReadConstChan = ReadConst & 3;
++  unsigned ReadConstIndex = ReadConst & (~3);
++  if (ReadConstChan < 2) {
++    if (!UsedConst.XYPair) {
++      UsedConst.XYPair = ReadConstIndex;
++    }
++    return UsedConst.XYPair == ReadConstIndex;
++  } else {
++    if (!UsedConst.ZWPair) {
++      UsedConst.ZWPair = ReadConstIndex;
++    }
++    return UsedConst.ZWPair == ReadConstIndex;
++  }
++}
++
++static bool isControlFlow(const MachineInstr &MI) {
++  return (MI.getOpcode() == AMDGPU::IF_PREDICATE_SET) ||
++  (MI.getOpcode() == AMDGPU::ENDIF) ||
++  (MI.getOpcode() == AMDGPU::ELSE) ||
++  (MI.getOpcode() == AMDGPU::WHILELOOP) ||
++  (MI.getOpcode() == AMDGPU::BREAK);
++}
++
++bool R600LowerConstCopy::runOnMachineFunction(MachineFunction &MF) {
++
++  for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
++                                                  BB != BB_E; ++BB) {
++    MachineBasicBlock &MBB = *BB;
++    DenseMap<unsigned, MachineInstr *> RegToConstIndex;
++    for (MachineBasicBlock::instr_iterator I = MBB.instr_begin(),
++        E = MBB.instr_end(); I != E;) {
++
++      if (I->getOpcode() == AMDGPU::CONST_COPY) {
++        MachineInstr &MI = *I;
++        I = llvm::next(I);
++        unsigned DstReg = MI.getOperand(0).getReg();
++        DenseMap<unsigned, MachineInstr *>::iterator SrcMI =
++            RegToConstIndex.find(DstReg);
++        if (SrcMI != RegToConstIndex.end()) {
++          SrcMI->second->eraseFromParent();
++          RegToConstIndex.erase(SrcMI);
++        }
++        MachineInstr *NewMI = 
++            TII->buildDefaultInstruction(MBB, &MI, AMDGPU::MOV,
++            MI.getOperand(0).getReg(), AMDGPU::ALU_CONST);
++        TII->setImmOperand(NewMI, R600Operands::SRC0_SEL,
++            MI.getOperand(1).getImm());
++        RegToConstIndex[DstReg] = NewMI;
++        MI.eraseFromParent();
++        continue;
++      }
++
++      std::vector<unsigned> Defs;
++      // We consider all Instructions as bundled because algorithm that  handle
++      // const read port limitations inside an IG is still valid with single
++      // instructions.
++      std::vector<MachineInstr *> Bundle;
++
++      if (I->isBundle()) {
++        unsigned BundleSize = I->getBundleSize();
++        for (unsigned i = 0; i < BundleSize; i++) {
++          I = llvm::next(I);
++          Bundle.push_back(I);
++        }
++      } else if (TII->isALUInstr(I->getOpcode())){
++        Bundle.push_back(I);
++      } else if (isControlFlow(*I)) {
++          RegToConstIndex.clear();
++          I = llvm::next(I);
++          continue;
++      } else {
++        MachineInstr &MI = *I;
++        for (MachineInstr::mop_iterator MOp = MI.operands_begin(),
++            MOpE = MI.operands_end(); MOp != MOpE; ++MOp) {
++          MachineOperand &MO = *MOp;
++          if (!MO.isReg())
++            continue;
++          if (MO.isDef()) {
++            Defs.push_back(MO.getReg());
++          } else {
++            // Either a TEX or an Export inst, prevent from erasing def of used
++            // operand
++            RegToConstIndex.erase(MO.getReg());
++            for (MCSubRegIterator SR(MO.getReg(), &TII->getRegisterInfo());
++                SR.isValid(); ++SR) {
++              RegToConstIndex.erase(*SR);
++            }
++          }
++        }
++      }
++
++
++      R600Operands::Ops OpTable[3][2] = {
++        {R600Operands::SRC0, R600Operands::SRC0_SEL},
++        {R600Operands::SRC1, R600Operands::SRC1_SEL},
++        {R600Operands::SRC2, R600Operands::SRC2_SEL},
++      };
++
++      for(std::vector<MachineInstr *>::iterator It = Bundle.begin(),
++          ItE = Bundle.end(); It != ItE; ++It) {
++        MachineInstr *MI = *It;
++        if (TII->isPredicated(MI)) {
++          // We don't want to erase previous assignment
++          RegToConstIndex.erase(MI->getOperand(0).getReg());
++        } else {
++          int WriteIDX = TII->getOperandIdx(MI->getOpcode(), R600Operands::WRITE);
++          if (WriteIDX < 0 || MI->getOperand(WriteIDX).getImm())
++            Defs.push_back(MI->getOperand(0).getReg());
++        }
++      }
++
++      ConstPairs CP = {0,0};
++      for (unsigned SrcOp = 0; SrcOp < 3; SrcOp++) {
++        for(std::vector<MachineInstr *>::iterator It = Bundle.begin(),
++            ItE = Bundle.end(); It != ItE; ++It) {
++          MachineInstr *MI = *It;
++          int SrcIdx = TII->getOperandIdx(MI->getOpcode(), OpTable[SrcOp][0]);
++          if (SrcIdx < 0)
++            continue;
++          MachineOperand &MO = MI->getOperand(SrcIdx);
++          DenseMap<unsigned, MachineInstr *>::iterator SrcMI =
++              RegToConstIndex.find(MO.getReg());
++          if (SrcMI != RegToConstIndex.end()) {
++            MachineInstr *CstMov = SrcMI->second;
++            int ConstMovSel =
++                TII->getOperandIdx(CstMov->getOpcode(), R600Operands::SRC0_SEL);
++            unsigned ConstIndex = CstMov->getOperand(ConstMovSel).getImm();
++            if (MI->isInsideBundle() && canFoldInBundle(CP, ConstIndex)) {
++              TII->setImmOperand(MI, OpTable[SrcOp][1], ConstIndex);
++              MI->getOperand(SrcIdx).setReg(AMDGPU::ALU_CONST);
++            } else {
++              RegToConstIndex.erase(SrcMI);
++            }
++          }
++        }
++      }
++
++      for (std::vector<unsigned>::iterator It = Defs.begin(), ItE = Defs.end();
++          It != ItE; ++It) {
++        DenseMap<unsigned, MachineInstr *>::iterator SrcMI =
++            RegToConstIndex.find(*It);
++        if (SrcMI != RegToConstIndex.end()) {
++          SrcMI->second->eraseFromParent();
++          RegToConstIndex.erase(SrcMI);
++        }
++      }
++      I = llvm::next(I);
++    }
++
++    if (MBB.succ_empty()) {
++      for (DenseMap<unsigned, MachineInstr *>::iterator
++          DI = RegToConstIndex.begin(), DE = RegToConstIndex.end();
++          DI != DE; ++DI) {
++        DI->second->eraseFromParent();
++      }
++    }
++  }
++  return false;
++}
++
++FunctionPass *createR600LowerConstCopy(TargetMachine &tm) {
++  return new R600LowerConstCopy(tm);
++}
++
 +}
++
++
 diff --git a/lib/Target/R600/R600MachineFunctionInfo.cpp b/lib/Target/R600/R600MachineFunctionInfo.cpp
 new file mode 100644
-index 0000000..4eb5efa
+index 0000000..40aec83
 --- /dev/null
 +++ b/lib/Target/R600/R600MachineFunctionInfo.cpp
-@@ -0,0 +1,34 @@
+@@ -0,0 +1,18 @@
 +//===-- R600MachineFunctionInfo.cpp - R600 Machine Function Info-*- C++ -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -17182,31 +18494,15 @@ index 0000000..4eb5efa
 +using namespace llvm;
 +
 +R600MachineFunctionInfo::R600MachineFunctionInfo(const MachineFunction &MF)
-+  : MachineFunctionInfo(),
-+    HasLinearInterpolation(false),
-+    HasPerspectiveInterpolation(false) {
++  : MachineFunctionInfo() {
 +    memset(Outputs, 0, sizeof(Outputs));
-+    memset(StreamOutputs, 0, sizeof(StreamOutputs));
 +  }
-+
-+unsigned R600MachineFunctionInfo::GetIJPerspectiveIndex() const {
-+  assert(HasPerspectiveInterpolation);
-+  return 0;
-+}
-+
-+unsigned R600MachineFunctionInfo::GetIJLinearIndex() const {
-+  assert(HasLinearInterpolation);
-+  if (HasPerspectiveInterpolation)
-+    return 1;
-+  else
-+    return 0;
-+}
 diff --git a/lib/Target/R600/R600MachineFunctionInfo.h b/lib/Target/R600/R600MachineFunctionInfo.h
 new file mode 100644
-index 0000000..e97fb5b
+index 0000000..41e4894
 --- /dev/null
 +++ b/lib/Target/R600/R600MachineFunctionInfo.h
-@@ -0,0 +1,39 @@
+@@ -0,0 +1,33 @@
 +//===-- R600MachineFunctionInfo.h - R600 Machine Function Info ----*- C++ -*-=//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -17222,6 +18518,7 @@ index 0000000..e97fb5b
 +#ifndef R600MACHINEFUNCTIONINFO_H
 +#define R600MACHINEFUNCTIONINFO_H
 +
++#include "llvm/ADT/BitVector.h"
 +#include "llvm/CodeGen/MachineFunction.h"
 +#include "llvm/CodeGen/SelectionDAG.h"
 +#include <vector>
@@ -17232,15 +18529,8 @@ index 0000000..e97fb5b
 +
 +public:
 +  R600MachineFunctionInfo(const MachineFunction &MF);
-+  std::vector<unsigned> ReservedRegs;
++  std::vector<unsigned> IndirectRegs;
 +  SDNode *Outputs[16];
-+  SDNode *StreamOutputs[64][4];
-+  bool HasLinearInterpolation;
-+  bool HasPerspectiveInterpolation;
-+
-+  unsigned GetIJLinearIndex() const;
-+  unsigned GetIJPerspectiveIndex() const;
-+
 +};
 +
 +} // End llvm namespace
@@ -17248,10 +18538,10 @@ index 0000000..e97fb5b
 +#endif //R600MACHINEFUNCTIONINFO_H
 diff --git a/lib/Target/R600/R600RegisterInfo.cpp b/lib/Target/R600/R600RegisterInfo.cpp
 new file mode 100644
-index 0000000..a39f83d
+index 0000000..bbd7995
 --- /dev/null
 +++ b/lib/Target/R600/R600RegisterInfo.cpp
-@@ -0,0 +1,89 @@
+@@ -0,0 +1,99 @@
 +//===-- R600RegisterInfo.cpp - R600 Register Information ------------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -17269,6 +18559,7 @@ index 0000000..a39f83d
 +#include "R600RegisterInfo.h"
 +#include "AMDGPUTargetMachine.h"
 +#include "R600Defines.h"
++#include "R600InstrInfo.h"
 +#include "R600MachineFunctionInfo.h"
 +
 +using namespace llvm;
@@ -17282,7 +18573,6 @@ index 0000000..a39f83d
 +
 +BitVector R600RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
 +  BitVector Reserved(getNumRegs());
-+  const R600MachineFunctionInfo * MFI = MF.getInfo<R600MachineFunctionInfo>();
 +
 +  Reserved.set(AMDGPU::ZERO);
 +  Reserved.set(AMDGPU::HALF);
@@ -17292,21 +18582,30 @@ index 0000000..a39f83d
 +  Reserved.set(AMDGPU::NEG_ONE);
 +  Reserved.set(AMDGPU::PV_X);
 +  Reserved.set(AMDGPU::ALU_LITERAL_X);
++  Reserved.set(AMDGPU::ALU_CONST);
 +  Reserved.set(AMDGPU::PREDICATE_BIT);
 +  Reserved.set(AMDGPU::PRED_SEL_OFF);
 +  Reserved.set(AMDGPU::PRED_SEL_ZERO);
 +  Reserved.set(AMDGPU::PRED_SEL_ONE);
 +
-+  for (TargetRegisterClass::iterator I = AMDGPU::R600_CReg32RegClass.begin(),
-+                        E = AMDGPU::R600_CReg32RegClass.end(); I != E; ++I) {
++  for (TargetRegisterClass::iterator I = AMDGPU::R600_AddrRegClass.begin(),
++                        E = AMDGPU::R600_AddrRegClass.end(); I != E; ++I) {
 +    Reserved.set(*I);
 +  }
 +
-+  for (std::vector<unsigned>::const_iterator I = MFI->ReservedRegs.begin(),
-+                                    E = MFI->ReservedRegs.end(); I != E; ++I) {
++  for (TargetRegisterClass::iterator I = AMDGPU::TRegMemRegClass.begin(),
++                                     E = AMDGPU::TRegMemRegClass.end();
++                                     I !=  E; ++I) {
 +    Reserved.set(*I);
 +  }
 +
++  const R600InstrInfo *RII = static_cast<const R600InstrInfo*>(&TII);
++  std::vector<unsigned> IndirectRegs = RII->getIndirectReservedRegs(MF);
++  for (std::vector<unsigned>::iterator I = IndirectRegs.begin(),
++                                       E = IndirectRegs.end();
++                                       I != E; ++I) {
++    Reserved.set(*I);
++  }
 +  return Reserved;
 +}
 +
@@ -17335,12 +18634,13 @@ index 0000000..a39f83d
 +unsigned R600RegisterInfo::getSubRegFromChannel(unsigned Channel) const {
 +  switch (Channel) {
 +    default: assert(!"Invalid channel index"); return 0;
-+    case 0: return AMDGPU::sel_x;
-+    case 1: return AMDGPU::sel_y;
-+    case 2: return AMDGPU::sel_z;
-+    case 3: return AMDGPU::sel_w;
++    case 0: return AMDGPU::sub0;
++    case 1: return AMDGPU::sub1;
++    case 2: return AMDGPU::sub2;
++    case 3: return AMDGPU::sub3;
 +  }
 +}
++
 diff --git a/lib/Target/R600/R600RegisterInfo.h b/lib/Target/R600/R600RegisterInfo.h
 new file mode 100644
 index 0000000..c170ccb
@@ -17404,10 +18704,10 @@ index 0000000..c170ccb
 +#endif // AMDIDSAREGISTERINFO_H_
 diff --git a/lib/Target/R600/R600RegisterInfo.td b/lib/Target/R600/R600RegisterInfo.td
 new file mode 100644
-index 0000000..d3d6d25
+index 0000000..a7d847a
 --- /dev/null
 +++ b/lib/Target/R600/R600RegisterInfo.td
-@@ -0,0 +1,107 @@
+@@ -0,0 +1,146 @@
 +
 +class R600Reg <string name, bits<16> encoding> : Register<name> {
 +  let Namespace = "AMDGPU";
@@ -17429,7 +18729,7 @@ index 0000000..d3d6d25
 +class R600Reg_128<string n, list<Register> subregs, bits<16> encoding> :
 +    RegisterWithSubRegs<n, subregs> {
 +  let Namespace = "AMDGPU";
-+  let SubRegIndices = [sel_x, sel_y, sel_z, sel_w];
++  let SubRegIndices = [sub0, sub1, sub2, sub3];
 +  let HWEncoding = encoding;
 +}
 +
@@ -17438,9 +18738,11 @@ index 0000000..d3d6d25
 +    // 32-bit Temporary Registers
 +    def T#Index#_#Chan : R600RegWithChan <"T"#Index#"."#Chan, Index, Chan>;
 +
-+    // 32-bit Constant Registers (There are more than 128, this the number
-+    // that is currently supported.
-+    def C#Index#_#Chan : R600RegWithChan <"C"#Index#"."#Chan, Index, Chan>;
++    // Indirect addressing offset registers
++    def Addr#Index#_#Chan : R600RegWithChan <"T("#Index#" + AR.x)."#Chan,
++                                              Index, Chan>;
++    def TRegMem#Index#_#Chan : R600RegWithChan <"T"#Index#"."#Chan, Index,
++                                                Chan>;
 +  }
 +  // 128-bit Temporary Registers
 +  def T#Index#_XYZW : R600Reg_128 <"T"#Index#".XYZW",
@@ -17471,19 +18773,25 @@ index 0000000..d3d6d25
 +def PRED_SEL_OFF: R600Reg<"Pred_sel_off", 0>;
 +def PRED_SEL_ZERO : R600Reg<"Pred_sel_zero", 2>;
 +def PRED_SEL_ONE : R600Reg<"Pred_sel_one", 3>;
++def AR_X : R600Reg<"AR.x", 0>;
 +
 +def R600_ArrayBase : RegisterClass <"AMDGPU", [f32, i32], 32,
 +                          (add (sequence "ArrayBase%u", 448, 464))>;
++// special registers for ALU src operands
++// const buffer reference, SRCx_SEL contains index
++def ALU_CONST : R600Reg<"CBuf", 0>;
++// interpolation param reference, SRCx_SEL contains index
++def ALU_PARAM : R600Reg<"Param", 0>;
++
++let isAllocatable = 0 in {
++
++// XXX: Only use the X channel, until we support wider stack widths
++def R600_Addr : RegisterClass <"AMDGPU", [i32], 127, (add (sequence "Addr%u_X", 0, 127))>;
 +
-+def R600_CReg32 : RegisterClass <"AMDGPU", [f32, i32], 32,
-+                          (add (interleave
-+                                  (interleave (sequence "C%u_X", 0, 127),
-+                                              (sequence "C%u_Z", 0, 127)),
-+                                  (interleave (sequence "C%u_Y", 0, 127),
-+                                              (sequence "C%u_W", 0, 127))))>;
++} // End isAllocatable = 0
 +
 +def R600_TReg32_X : RegisterClass <"AMDGPU", [f32, i32], 32,
-+                                   (add (sequence "T%u_X", 0, 127))>;
++                                   (add (sequence "T%u_X", 0, 127), AR_X)>;
 +
 +def R600_TReg32_Y : RegisterClass <"AMDGPU", [f32, i32], 32,
 +                                   (add (sequence "T%u_Y", 0, 127))>;
@@ -17495,15 +18803,16 @@ index 0000000..d3d6d25
 +                                   (add (sequence "T%u_W", 0, 127))>;
 +
 +def R600_TReg32 : RegisterClass <"AMDGPU", [f32, i32], 32,
-+                          (add (interleave
-+                                 (interleave R600_TReg32_X, R600_TReg32_Z),
-+                                 (interleave R600_TReg32_Y, R600_TReg32_W)))>;
++                                   (interleave R600_TReg32_X, R600_TReg32_Y,
++                                               R600_TReg32_Z, R600_TReg32_W)>;
 +
 +def R600_Reg32 : RegisterClass <"AMDGPU", [f32, i32], 32, (add
 +    R600_TReg32,
-+    R600_CReg32,
 +    R600_ArrayBase,
-+    ZERO, HALF, ONE, ONE_INT, PV_X, ALU_LITERAL_X, NEG_ONE, NEG_HALF)>;
++    R600_Addr,
++    ZERO, HALF, ONE, ONE_INT, PV_X, ALU_LITERAL_X, NEG_ONE, NEG_HALF,
++    ALU_CONST, ALU_PARAM
++    )>;
 +
 +def R600_Predicate : RegisterClass <"AMDGPU", [i32], 32, (add
 +    PRED_SEL_OFF, PRED_SEL_ZERO, PRED_SEL_ONE)>;
@@ -17515,6 +18824,36 @@ index 0000000..d3d6d25
 +                                (add (sequence "T%u_XYZW", 0, 127))> {
 +  let CopyCost = -1;
 +}
++
++//===----------------------------------------------------------------------===//
++// Register classes for indirect addressing
++//===----------------------------------------------------------------------===//
++
++// Super register for all the Indirect Registers.  This register class is used
++// by the REG_SEQUENCE instruction to specify the registers to use for direct
++// reads / writes which may be written / read by an indirect address.
++class IndirectSuper<string n, list<Register> subregs> :
++    RegisterWithSubRegs<n, subregs> {
++  let Namespace = "AMDGPU";
++  let SubRegIndices =
++ [sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7,
++  sub8, sub9, sub10, sub11, sub12, sub13, sub14, sub15];
++}
++
++def IndirectSuperReg : IndirectSuper<"Indirect",
++  [TRegMem0_X, TRegMem1_X, TRegMem2_X, TRegMem3_X, TRegMem4_X, TRegMem5_X,
++   TRegMem6_X, TRegMem7_X, TRegMem8_X, TRegMem9_X, TRegMem10_X, TRegMem11_X,
++   TRegMem12_X, TRegMem13_X, TRegMem14_X, TRegMem15_X]
++>;
++
++def IndirectReg : RegisterClass<"AMDGPU", [f32, i32], 32, (add IndirectSuperReg)>;
++
++// This register class defines the registers that are the storage units for
++// the "Indirect Addressing" pseudo memory space.
++// XXX: Only use the X channel, until we support wider stack widths
++def TRegMem : RegisterClass<"AMDGPU", [f32, i32], 32,
++  (add (sequence "TRegMem%u_X", 0, 16))
++>;
 diff --git a/lib/Target/R600/R600Schedule.td b/lib/Target/R600/R600Schedule.td
 new file mode 100644
 index 0000000..7ede181
@@ -18053,10 +19392,10 @@ index 0000000..832e44d
 +}
 diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
 new file mode 100644
-index 0000000..cd6e0e9
+index 0000000..694c045
 --- /dev/null
 +++ b/lib/Target/R600/SIISelLowering.cpp
-@@ -0,0 +1,512 @@
+@@ -0,0 +1,399 @@
 +//===-- SIISelLowering.cpp - SI DAG Lowering Implementation ---------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -18090,16 +19429,16 @@ index 0000000..cd6e0e9
 +  addRegisterClass(MVT::f32, &AMDGPU::VReg_32RegClass);
 +  addRegisterClass(MVT::i32, &AMDGPU::VReg_32RegClass);
 +  addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass);
-+  addRegisterClass(MVT::i1, &AMDGPU::SCCRegRegClass);
-+  addRegisterClass(MVT::i1, &AMDGPU::VCCRegRegClass);
++  addRegisterClass(MVT::i1, &AMDGPU::SReg_64RegClass);
 +
-+  addRegisterClass(MVT::v4i32, &AMDGPU::SReg_128RegClass);
-+  addRegisterClass(MVT::v8i32, &AMDGPU::SReg_256RegClass);
++  addRegisterClass(MVT::v1i32, &AMDGPU::VReg_32RegClass);
++  addRegisterClass(MVT::v2i32, &AMDGPU::VReg_64RegClass);
++  addRegisterClass(MVT::v4i32, &AMDGPU::VReg_128RegClass);
++  addRegisterClass(MVT::v8i32, &AMDGPU::VReg_256RegClass);
++  addRegisterClass(MVT::v16i32, &AMDGPU::VReg_512RegClass);
 +
 +  computeRegisterProperties();
 +
-+  setOperationAction(ISD::AND, MVT::i1, Custom);
-+
 +  setOperationAction(ISD::ADD, MVT::i64, Legal);
 +  setOperationAction(ISD::ADD, MVT::i32, Legal);
 +
@@ -18125,23 +19464,16 @@ index 0000000..cd6e0e9
 +  MachineRegisterInfo & MRI = BB->getParent()->getRegInfo();
 +  MachineBasicBlock::iterator I = MI;
 +
-+  if (TII->get(MI->getOpcode()).TSFlags & SIInstrFlags::NEED_WAIT) {
-+    AppendS_WAITCNT(MI, *BB, llvm::next(I));
-+    return BB;
-+  }
-+
 +  switch (MI->getOpcode()) {
 +  default:
 +    return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB);
 +  case AMDGPU::BRANCH: return BB;
 +  case AMDGPU::CLAMP_SI:
-+    BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64))
++    BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
 +           .addOperand(MI->getOperand(0))
 +           .addOperand(MI->getOperand(1))
-+           // VSRC1-2 are unused, but we still need to fill all the
-+           // operand slots, so we just reuse the VSRC0 operand
-+           .addOperand(MI->getOperand(1))
-+           .addOperand(MI->getOperand(1))
++           .addImm(0x80) // SRC1
++           .addImm(0x80) // SRC2
 +           .addImm(0) // ABS
 +           .addImm(1) // CLAMP
 +           .addImm(0) // OMOD
@@ -18150,13 +19482,11 @@ index 0000000..cd6e0e9
 +    break;
 +
 +  case AMDGPU::FABS_SI:
-+    BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64))
++    BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
 +                 .addOperand(MI->getOperand(0))
 +                 .addOperand(MI->getOperand(1))
-+                 // VSRC1-2 are unused, but we still need to fill all the
-+                 // operand slots, so we just reuse the VSRC0 operand
-+                 .addOperand(MI->getOperand(1))
-+                 .addOperand(MI->getOperand(1))
++                 .addImm(0x80) // SRC1
++                 .addImm(0x80) // SRC2
 +                 .addImm(1) // ABS
 +                 .addImm(0) // CLAMP
 +                 .addImm(0) // OMOD
@@ -18165,13 +19495,11 @@ index 0000000..cd6e0e9
 +    break;
 +
 +  case AMDGPU::FNEG_SI:
-+    BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_MOV_B32_e64))
++    BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::V_ADD_F32_e64))
 +                 .addOperand(MI->getOperand(0))
 +                 .addOperand(MI->getOperand(1))
-+                 // VSRC1-2 are unused, but we still need to fill all the
-+                 // operand slots, so we just reuse the VSRC0 operand
-+                 .addOperand(MI->getOperand(1))
-+                 .addOperand(MI->getOperand(1))
++                 .addImm(0x80) // SRC1
++                 .addImm(0x80) // SRC2
 +                 .addImm(0) // ABS
 +                 .addImm(0) // CLAMP
 +                 .addImm(0) // OMOD
@@ -18187,29 +19515,13 @@ index 0000000..cd6e0e9
 +  case AMDGPU::SI_INTERP:
 +    LowerSI_INTERP(MI, *BB, I, MRI);
 +    break;
-+  case AMDGPU::SI_INTERP_CONST:
-+    LowerSI_INTERP_CONST(MI, *BB, I, MRI);
-+    break;
-+  case AMDGPU::SI_KIL:
-+    LowerSI_KIL(MI, *BB, I, MRI);
-+    break;
 +  case AMDGPU::SI_WQM:
 +    LowerSI_WQM(MI, *BB, I, MRI);
 +    break;
-+  case AMDGPU::SI_V_CNDLT:
-+    LowerSI_V_CNDLT(MI, *BB, I, MRI);
-+    break;
 +  }
 +  return BB;
 +}
 +
-+void SITargetLowering::AppendS_WAITCNT(MachineInstr *MI, MachineBasicBlock &BB,
-+    MachineBasicBlock::iterator I) const {
-+  BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_WAITCNT))
-+          .addImm(0);
-+}
-+
-+
 +void SITargetLowering::LowerSI_WQM(MachineInstr *MI, MachineBasicBlock &BB,
 +    MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const {
 +  BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_WQM_B64), AMDGPU::EXEC)
@@ -18249,57 +19561,6 @@ index 0000000..cd6e0e9
 +  MI->eraseFromParent();
 +}
 +
-+void SITargetLowering::LowerSI_INTERP_CONST(MachineInstr *MI,
-+    MachineBasicBlock &BB, MachineBasicBlock::iterator I,
-+    MachineRegisterInfo &MRI) const {
-+  MachineOperand dst = MI->getOperand(0);
-+  MachineOperand attr_chan = MI->getOperand(1);
-+  MachineOperand attr = MI->getOperand(2);
-+  MachineOperand params = MI->getOperand(3);
-+  unsigned M0 = MRI.createVirtualRegister(&AMDGPU::M0RegRegClass);
-+
-+  BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_MOV_B32), M0)
-+          .addOperand(params);
-+
-+  BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_INTERP_MOV_F32))
-+          .addOperand(dst)
-+          .addOperand(attr_chan)
-+          .addOperand(attr)
-+          .addReg(M0);
-+
-+  MI->eraseFromParent();
-+}
-+
-+void SITargetLowering::LowerSI_KIL(MachineInstr *MI, MachineBasicBlock &BB,
-+    MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const {
-+  // Clear this pixel from the exec mask if the operand is negative
-+  BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_CMPX_LE_F32_e32),
-+          AMDGPU::VCC)
-+          .addReg(AMDGPU::SREG_LIT_0)
-+          .addOperand(MI->getOperand(0));
-+
-+  MI->eraseFromParent();
-+}
-+
-+void SITargetLowering::LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB,
-+    MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const {
-+  unsigned VCC = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass);
-+
-+  BuildMI(BB, I, BB.findDebugLoc(I),
-+          TII->get(AMDGPU::V_CMP_GT_F32_e32),
-+          VCC)
-+          .addReg(AMDGPU::SREG_LIT_0)
-+          .addOperand(MI->getOperand(1));
-+
-+  BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_CNDMASK_B32_e32))
-+          .addOperand(MI->getOperand(0))
-+          .addOperand(MI->getOperand(3))
-+          .addOperand(MI->getOperand(2))
-+          .addReg(VCC);
-+
-+  MI->eraseFromParent();
-+}
-+
 +EVT SITargetLowering::getSetCCResultType(EVT VT) const {
 +  return MVT::i1;
 +}
@@ -18314,7 +19575,6 @@ index 0000000..cd6e0e9
 +  case ISD::BRCOND: return LowerBRCOND(Op, DAG);
 +  case ISD::LOAD: return LowerLOAD(Op, DAG);
 +  case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG);
-+  case ISD::AND: return Loweri1ContextSwitch(Op, DAG, ISD::AND);
 +  case ISD::INTRINSIC_WO_CHAIN: {
 +    unsigned IntrinsicID =
 +                         cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
@@ -18331,30 +19591,6 @@ index 0000000..cd6e0e9
 +  return SDValue();
 +}
 +
-+/// \brief The function is for lowering i1 operations on the
-+/// VCC register.
-+///
-+/// In the VALU context, VCC is a one bit register, but in the
-+/// SALU context the VCC is a 64-bit register (1-bit per thread).  Since only
-+/// the SALU can perform operations on the VCC register, we need to promote
-+/// the operand types from i1 to i64 in order for tablegen to be able to match
-+/// this operation to the correct SALU instruction.  We do this promotion by
-+/// wrapping the operands in a CopyToReg node.
-+///
-+SDValue SITargetLowering::Loweri1ContextSwitch(SDValue Op,
-+                                               SelectionDAG &DAG,
-+                                               unsigned VCCNode) const {
-+  DebugLoc DL = Op.getDebugLoc();
-+
-+  SDValue OpNode = DAG.getNode(VCCNode, DL, MVT::i64,
-+                               DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i64,
-+                                           Op.getOperand(0)),
-+                               DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i64,
-+                                           Op.getOperand(1)));
-+
-+  return DAG.getNode(SIISD::VCC_BITCAST, DL, MVT::i1, OpNode);
-+}
-+
 +/// \brief Helper function for LowerBRCOND
 +static SDNode *findUser(SDValue Value, unsigned Opcode) {
 +
@@ -18559,22 +19795,12 @@ index 0000000..cd6e0e9
 +  }
 +  return SDValue();
 +}
-+
-+#define NODE_NAME_CASE(node) case SIISD::node: return #node;
-+
-+const char* SITargetLowering::getTargetNodeName(unsigned Opcode) const {
-+  switch (Opcode) {
-+  default: return AMDGPUTargetLowering::getTargetNodeName(Opcode);
-+  NODE_NAME_CASE(VCC_AND)
-+  NODE_NAME_CASE(VCC_BITCAST)
-+  }
-+}
 diff --git a/lib/Target/R600/SIISelLowering.h b/lib/Target/R600/SIISelLowering.h
 new file mode 100644
-index 0000000..c088112
+index 0000000..5d048f8
 --- /dev/null
 +++ b/lib/Target/R600/SIISelLowering.h
-@@ -0,0 +1,62 @@
+@@ -0,0 +1,48 @@
 +//===-- SIISelLowering.h - SI DAG Lowering Interface ------------*- C++ -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -18600,26 +19826,13 @@ index 0000000..c088112
 +class SITargetLowering : public AMDGPUTargetLowering {
 +  const SIInstrInfo * TII;
 +
-+  /// Memory reads and writes are syncronized using the S_WAITCNT instruction.
-+  /// This function takes the most conservative approach and inserts an
-+  /// S_WAITCNT instruction after every read and write.
-+  void AppendS_WAITCNT(MachineInstr *MI, MachineBasicBlock &BB,
-+              MachineBasicBlock::iterator I) const;
 +  void LowerMOV_IMM(MachineInstr *MI, MachineBasicBlock &BB,
 +              MachineBasicBlock::iterator I, unsigned Opocde) const;
 +  void LowerSI_INTERP(MachineInstr *MI, MachineBasicBlock &BB,
 +              MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
-+  void LowerSI_INTERP_CONST(MachineInstr *MI, MachineBasicBlock &BB,
-+              MachineBasicBlock::iterator I, MachineRegisterInfo &MRI) const;
-+  void LowerSI_KIL(MachineInstr *MI, MachineBasicBlock &BB,
-+              MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
 +  void LowerSI_WQM(MachineInstr *MI, MachineBasicBlock &BB,
 +              MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
-+  void LowerSI_V_CNDLT(MachineInstr *MI, MachineBasicBlock &BB,
-+              MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
 +
-+  SDValue Loweri1ContextSwitch(SDValue Op, SelectionDAG &DAG,
-+                                           unsigned VCCNode) const;
 +  SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
 +  SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
 +  SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;
@@ -18631,18 +19844,376 @@ index 0000000..c088112
 +  virtual EVT getSetCCResultType(EVT VT) const;
 +  virtual SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const;
 +  virtual SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const;
-+  virtual const char* getTargetNodeName(unsigned Opcode) const;
 +};
 +
 +} // End namespace llvm
 +
 +#endif //SIISELLOWERING_H
+diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp
+new file mode 100644
+index 0000000..24fc929
+--- /dev/null
++++ b/lib/Target/R600/SIInsertWaits.cpp
+@@ -0,0 +1,353 @@
++//===-- SILowerControlFlow.cpp - Use predicates for control flow ----------===//
++//
++//                     The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++//
++/// \file
++/// \brief Insert wait instructions for memory reads and writes.
++///
++/// Memory reads and writes are issued asynchronously, so we need to insert
++/// S_WAITCNT instructions when we want to access any of their results or
++/// overwrite any register that's used asynchronously.
++//
++//===----------------------------------------------------------------------===//
++
++#include "AMDGPU.h"
++#include "SIInstrInfo.h"
++#include "SIMachineFunctionInfo.h"
++#include "llvm/CodeGen/MachineFunction.h"
++#include "llvm/CodeGen/MachineFunctionPass.h"
++#include "llvm/CodeGen/MachineInstrBuilder.h"
++#include "llvm/CodeGen/MachineRegisterInfo.h"
++
++using namespace llvm;
++
++namespace {
++
++/// \brief One variable for each of the hardware counters
++typedef union {
++  struct {
++    unsigned VM;
++    unsigned EXP;
++    unsigned LGKM;
++  } Named;
++  unsigned Array[3];
++
++} Counters;
++
++typedef Counters RegCounters[512];
++typedef std::pair<unsigned, unsigned> RegInterval;
++
++class SIInsertWaits : public MachineFunctionPass {
++
++private:
++  static char ID;
++  const SIInstrInfo *TII;
++  const SIRegisterInfo &TRI;
++  const MachineRegisterInfo *MRI;
++
++  /// \brief Constant hardware limits
++  static const Counters WaitCounts;
++
++  /// \brief Constant zero value
++  static const Counters ZeroCounts;
++
++  /// \brief Counter values we have already waited on.
++  Counters WaitedOn;
++
++  /// \brief Counter values for last instruction issued.
++  Counters LastIssued;
++
++  /// \brief Registers used by async instructions.
++  RegCounters UsedRegs;
++
++  /// \brief Registers defined by async instructions.
++  RegCounters DefinedRegs;
++
++  /// \brief Different export instruction types seen since last wait.
++  unsigned ExpInstrTypesSeen;
++
++  /// \brief Get increment/decrement amount for this instruction.
++  Counters getHwCounts(MachineInstr &MI);
++
++  /// \brief Is operand relevant for async execution?
++  bool isOpRelevant(MachineOperand &Op);
++
++  /// \brief Get register interval an operand affects.
++  RegInterval getRegInterval(MachineOperand &Op);
++
++  /// \brief Handle instructions async components
++  void pushInstruction(MachineInstr &MI);
++
++  /// \brief Insert the actual wait instruction
++  bool insertWait(MachineBasicBlock &MBB,
++                  MachineBasicBlock::iterator I,
++                  const Counters &Counts);
++
++  /// \brief Resolve all operand dependencies to counter requirements
++  Counters handleOperands(MachineInstr &MI);
++
++public:
++  SIInsertWaits(TargetMachine &tm) :
++    MachineFunctionPass(ID),
++    TII(static_cast<const SIInstrInfo*>(tm.getInstrInfo())),
++    TRI(TII->getRegisterInfo()) { }
++
++  virtual bool runOnMachineFunction(MachineFunction &MF);
++
++  const char *getPassName() const {
++    return "SI insert wait  instructions";
++  }
++
++};
++
++} // End anonymous namespace
++
++char SIInsertWaits::ID = 0;
++
++const Counters SIInsertWaits::WaitCounts = { { 15, 7, 7 } };
++const Counters SIInsertWaits::ZeroCounts = { { 0, 0, 0 } };
++
++FunctionPass *llvm::createSIInsertWaits(TargetMachine &tm) {
++  return new SIInsertWaits(tm);
++}
++
++Counters SIInsertWaits::getHwCounts(MachineInstr &MI) {
++
++  uint64_t TSFlags = TII->get(MI.getOpcode()).TSFlags;
++  Counters Result;
++
++  Result.Named.VM = !!(TSFlags & SIInstrFlags::VM_CNT);
++
++  // Only consider stores or EXP for EXP_CNT
++  Result.Named.EXP = !!(TSFlags & SIInstrFlags::EXP_CNT &&
++      (MI.getOpcode() == AMDGPU::EXP || !MI.getDesc().mayStore()));
++
++  // LGKM may uses larger values
++  if (TSFlags & SIInstrFlags::LGKM_CNT) {
++
++    MachineOperand &Op = MI.getOperand(0);
++    assert(Op.isReg() && "First LGKM operand must be a register!");
++
++    unsigned Reg = Op.getReg();
++    unsigned Size = TRI.getMinimalPhysRegClass(Reg)->getSize();
++    Result.Named.LGKM = Size > 4 ? 2 : 1;
++
++  } else {
++    Result.Named.LGKM = 0;
++  }
++
++  return Result;
++}
++
++bool SIInsertWaits::isOpRelevant(MachineOperand &Op) {
++
++  // Constants are always irrelevant
++  if (!Op.isReg())
++    return false;
++
++  // Defines are always relevant
++  if (Op.isDef())
++    return true;
++
++  // For exports all registers are relevant
++  MachineInstr &MI = *Op.getParent();
++  if (MI.getOpcode() == AMDGPU::EXP)
++    return true;
++
++  // For stores the stored value is also relevant
++  if (!MI.getDesc().mayStore())
++    return false;
++
++  for (MachineInstr::mop_iterator I = MI.operands_begin(),
++       E = MI.operands_end(); I != E; ++I) {
++
++    if (I->isReg() && I->isUse())
++      return Op.isIdenticalTo(*I);
++  }
++
++  return false;
++}
++
++RegInterval SIInsertWaits::getRegInterval(MachineOperand &Op) {
++
++  if (!Op.isReg())
++    return std::make_pair(0, 0);
++
++  unsigned Reg = Op.getReg();
++  unsigned Size = TRI.getMinimalPhysRegClass(Reg)->getSize();
++
++  assert(Size >= 4);
++
++  RegInterval Result;
++  Result.first = TRI.getEncodingValue(Reg);
++  Result.second = Result.first + Size / 4;
++
++  return Result;
++}
++
++void SIInsertWaits::pushInstruction(MachineInstr &MI) {
++
++  // Get the hardware counter increments and sum them up
++  Counters Increment = getHwCounts(MI);
++  unsigned Sum = 0;
++
++  for (unsigned i = 0; i < 3; ++i) {
++    LastIssued.Array[i] += Increment.Array[i];
++    Sum += Increment.Array[i];
++  }
++
++  // If we don't increase anything then that's it
++  if (Sum == 0)
++    return;
++
++  // Remember which export instructions we have seen
++  if (Increment.Named.EXP) {
++    ExpInstrTypesSeen |= MI.getOpcode() == AMDGPU::EXP ? 1 : 2;
++  }
++
++  for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
++
++    MachineOperand &Op = MI.getOperand(i);
++    if (!isOpRelevant(Op))
++      continue;
++
++    RegInterval Interval = getRegInterval(Op);
++    for (unsigned j = Interval.first; j < Interval.second; ++j) {
++
++      // Remember which registers we define
++      if (Op.isDef())
++        DefinedRegs[j] = LastIssued;
++
++      // and which one we are using
++      if (Op.isUse())
++        UsedRegs[j] = LastIssued;
++    }
++  }
++}
++
++bool SIInsertWaits::insertWait(MachineBasicBlock &MBB,
++                               MachineBasicBlock::iterator I,
++                               const Counters &Required) {
++
++  // End of program? No need to wait on anything
++  if (I != MBB.end() && I->getOpcode() == AMDGPU::S_ENDPGM)
++    return false;
++
++  // Figure out if the async instructions execute in order
++  bool Ordered[3];
++
++  // VM_CNT is always ordered
++  Ordered[0] = true;
++
++  // EXP_CNT is unordered if we have both EXP & VM-writes
++  Ordered[1] = ExpInstrTypesSeen == 3;
++
++  // LGKM_CNT is handled as always unordered. TODO: Handle LDS and GDS
++  Ordered[2] = false;
++
++  // The values we are going to put into the S_WAITCNT instruction
++  Counters Counts = WaitCounts;
++
++  // Do we really need to wait?
++  bool NeedWait = false;
++
++  for (unsigned i = 0; i < 3; ++i) {
++
++    if (Required.Array[i] <= WaitedOn.Array[i])
++      continue;
++
++    NeedWait = true;
++    
++    if (Ordered[i]) {
++      unsigned Value = LastIssued.Array[i] - Required.Array[i];
++
++      // adjust the value to the real hardware posibilities
++      Counts.Array[i] = std::min(Value, WaitCounts.Array[i]);
++
++    } else
++      Counts.Array[i] = 0;
++
++    // Remember on what we have waited on
++    WaitedOn.Array[i] = LastIssued.Array[i] - Counts.Array[i];
++  }
++
++  if (!NeedWait)
++    return false;
++
++  // Reset EXP_CNT instruction types
++  if (Counts.Named.EXP == 0)
++    ExpInstrTypesSeen = 0;
++
++  // Build the wait instruction
++  BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))
++          .addImm((Counts.Named.VM & 0xF) |
++                  ((Counts.Named.EXP & 0x7) << 4) |
++                  ((Counts.Named.LGKM & 0x7) << 8));
++
++  return true;
++}
++
++/// \brief helper function for handleOperands
++static void increaseCounters(Counters &Dst, const Counters &Src) {
++
++  for (unsigned i = 0; i < 3; ++i)
++    Dst.Array[i] = std::max(Dst.Array[i], Src.Array[i]);
++}
++
++Counters SIInsertWaits::handleOperands(MachineInstr &MI) {
++
++  Counters Result = ZeroCounts;
++
++  // For each register affected by this
++  // instruction increase the result sequence
++  for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
++
++    MachineOperand &Op = MI.getOperand(i);
++    RegInterval Interval = getRegInterval(Op);
++    for (unsigned j = Interval.first; j < Interval.second; ++j) {
++
++      if (Op.isDef())
++        increaseCounters(Result, UsedRegs[j]);
++
++      if (Op.isUse())
++        increaseCounters(Result, DefinedRegs[j]);
++    }
++  }
++
++  return Result;
++}
++
++bool SIInsertWaits::runOnMachineFunction(MachineFunction &MF) {
++
++  bool Changes = false;
++
++  MRI = &MF.getRegInfo();
++
++  WaitedOn = ZeroCounts;
++  LastIssued = ZeroCounts;
++
++  memset(&UsedRegs, 0, sizeof(UsedRegs));
++  memset(&DefinedRegs, 0, sizeof(DefinedRegs));
++
++  for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
++       BI != BE; ++BI) {
++
++    MachineBasicBlock &MBB = *BI;
++    for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
++         I != E; ++I) {
++
++      Changes |= insertWait(MBB, I, handleOperands(*I));
++      pushInstruction(*I);
++    }
++
++    // Wait for everything at the end of the MBB
++    Changes |= insertWait(MBB, MBB.getFirstTerminator(), LastIssued);
++  }
++
++  return Changes;
++}
 diff --git a/lib/Target/R600/SIInstrFormats.td b/lib/Target/R600/SIInstrFormats.td
 new file mode 100644
-index 0000000..aea3b5a
+index 0000000..40e37aa
 --- /dev/null
 +++ b/lib/Target/R600/SIInstrFormats.td
-@@ -0,0 +1,146 @@
+@@ -0,0 +1,188 @@
 +//===-- SIInstrFormats.td - SI Instruction Formats ------------------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -18666,40 +20237,23 @@ index 0000000..aea3b5a
 +//
 +//===----------------------------------------------------------------------===//
 +
-+class VOP3b_2IN <bits<9> op, string opName, RegisterClass dstClass,
-+                 RegisterClass src0Class, RegisterClass src1Class,
-+                 list<dag> pattern>
-+  : VOP3b <op, (outs dstClass:$vdst),
-+               (ins src0Class:$src0, src1Class:$src1, InstFlag:$src2, InstFlag:$sdst,
-+                    InstFlag:$omod, InstFlag:$neg),
-+           opName, pattern
-+>;
-+
-+
-+class VOP3_1_32 <bits<9> op, string opName, list<dag> pattern>
-+  : VOP3b_2IN <op, opName, SReg_1, AllReg_32, VReg_32, pattern>;
-+
 +class VOP3_32 <bits<9> op, string opName, list<dag> pattern>
-+  : VOP3 <op, (outs VReg_32:$dst), (ins AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
++  : VOP3 <op, (outs VReg_32:$dst), (ins VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
 +
 +class VOP3_64 <bits<9> op, string opName, list<dag> pattern>
-+  : VOP3 <op, (outs VReg_64:$dst), (ins AllReg_64:$src0, VReg_64:$src1, VReg_64:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
-+
++  : VOP3 <op, (outs VReg_64:$dst), (ins VSrc_64:$src0, VReg_64:$src1, VReg_64:$src2, i32imm:$src3, i32imm:$src4, i32imm:$src5, i32imm:$src6), opName, pattern>;
 +
 +class SOP1_32 <bits<8> op, string opName, list<dag> pattern>
-+  : SOP1 <op, (outs SReg_32:$dst), (ins SReg_32:$src0), opName, pattern>;
++  : SOP1 <op, (outs SReg_32:$dst), (ins SSrc_32:$src0), opName, pattern>;
 +
 +class SOP1_64 <bits<8> op, string opName, list<dag> pattern>
-+  : SOP1 <op, (outs SReg_64:$dst), (ins SReg_64:$src0), opName, pattern>;
++  : SOP1 <op, (outs SReg_64:$dst), (ins SSrc_64:$src0), opName, pattern>;
 +
 +class SOP2_32 <bits<7> op, string opName, list<dag> pattern>
-+  : SOP2 <op, (outs SReg_32:$dst), (ins SReg_32:$src0, SReg_32:$src1), opName, pattern>;
++  : SOP2 <op, (outs SReg_32:$dst), (ins SSrc_32:$src0, SSrc_32:$src1), opName, pattern>;
 +
 +class SOP2_64 <bits<7> op, string opName, list<dag> pattern>
-+  : SOP2 <op, (outs SReg_64:$dst), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>;
-+
-+class SOP2_VCC <bits<7> op, string opName, list<dag> pattern>
-+  : SOP2 <op, (outs SReg_1:$vcc), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>;
++  : SOP2 <op, (outs SReg_64:$dst), (ins SSrc_64:$src0, SSrc_64:$src1), opName, pattern>;
 +
 +class VOP1_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc,
 +                   string opName, list<dag> pattern> : 
@@ -18708,7 +20262,7 @@ index 0000000..aea3b5a
 +  >;
 +
 +multiclass VOP1_32 <bits<8> op, string opName, list<dag> pattern> {
-+  def _e32: VOP1_Helper <op, VReg_32, AllReg_32, opName, pattern>;
++  def _e32: VOP1_Helper <op, VReg_32, VSrc_32, opName, pattern>;
 +  def _e64 : VOP3_32 <{1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
 +                      opName, []
 +  >;
@@ -18716,7 +20270,7 @@ index 0000000..aea3b5a
 +
 +multiclass VOP1_64 <bits<8> op, string opName, list<dag> pattern> {
 +
-+  def _e32 : VOP1_Helper <op, VReg_64, AllReg_64, opName, pattern>;
++  def _e32 : VOP1_Helper <op, VReg_64, VSrc_64, opName, pattern>;
 +
 +  def _e64 : VOP3_64 <
 +    {1, 1, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
@@ -18732,7 +20286,7 @@ index 0000000..aea3b5a
 +
 +multiclass VOP2_32 <bits<6> op, string opName, list<dag> pattern> {
 +
-+  def _e32 : VOP2_Helper <op, VReg_32, AllReg_32, opName, pattern>;
++  def _e32 : VOP2_Helper <op, VReg_32, VSrc_32, opName, pattern>;
 +
 +  def _e64 : VOP3_32 <{1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
 +                      opName, []
@@ -18740,7 +20294,7 @@ index 0000000..aea3b5a
 +}
 +
 +multiclass VOP2_64 <bits<6> op, string opName, list<dag> pattern> {
-+  def _e32: VOP2_Helper <op, VReg_64, AllReg_64, opName, pattern>;
++  def _e32: VOP2_Helper <op, VReg_64, VSrc_64, opName, pattern>;
 +
 +  def _e64 : VOP3_64 <
 +    {1, 0, 0, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
@@ -18754,47 +20308,106 @@ index 0000000..aea3b5a
 +class SOPK_64 <bits<5> op, string opName, list<dag> pattern>
 +  : SOPK <op, (outs SReg_64:$dst), (ins i16imm:$src0), opName, pattern>;
 +
-+class VOPC_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc,
-+                 string opName, list<dag> pattern> :
-+  VOPC <
-+    op, (ins arc:$src0, vrc:$src1), opName, pattern
-+  >;
++multiclass VOPC_Helper <bits<8> op, RegisterClass vrc, RegisterClass arc,
++                        string opName, list<dag> pattern> {
 +
-+multiclass VOPC_32 <bits<9> op, string opName, list<dag> pattern> {
++  def _e32 : VOPC <op, (ins arc:$src0, vrc:$src1), opName, pattern>;
++  def _e64 : VOP3 <
++    {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
++    (outs SReg_64:$dst),
++    (ins arc:$src0, vrc:$src1,
++         InstFlag:$abs, InstFlag:$clamp,
++         InstFlag:$omod, InstFlag:$neg),
++    opName, pattern
++  > {
++    let SRC2 = 0x80;
++  }
++}
 +
-+  def _e32 : VOPC_Helper <
-+    {op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-+    VReg_32, AllReg_32, opName, pattern
-+  >;
++multiclass VOPC_32 <bits<8> op, string opName, list<dag> pattern>
++  : VOPC_Helper <op, VReg_32, VSrc_32, opName, pattern>;
 +
-+  def _e64 : VOP3_1_32 <
-+    op,
-+    opName, pattern
-+  >;
++multiclass VOPC_64 <bits<8> op, string opName, list<dag> pattern>
++  : VOPC_Helper <op, VReg_64, VSrc_64, opName, pattern>;
++
++class SOPC_32 <bits<7> op, string opName, list<dag> pattern>
++  : SOPC <op, (outs SCCReg:$dst), (ins SSrc_32:$src0, SSrc_32:$src1), opName, pattern>;
++
++class SOPC_64 <bits<7> op, string opName, list<dag> pattern>
++  : SOPC <op, (outs SCCReg:$dst), (ins SSrc_64:$src0, SSrc_64:$src1), opName, pattern>;
++
++class MIMG_Load_Helper <bits<7> op, string asm> : MIMG <
++  op,
++  (outs VReg_128:$vdata),
++  (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
++       i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_32:$vaddr,
++       GPR4Align<SReg_256>:$srsrc, GPR4Align<SReg_128>:$ssamp),
++  asm,
++  []> {
++  let mayLoad = 1;
++  let mayStore = 0;
 +}
 +
-+multiclass VOPC_64 <bits<8> op, string opName, list<dag> pattern> {
++class MTBUF_Store_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
++  op,
++  (outs),
++  (ins regClass:$vdata, i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc,
++   i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr,
++   GPR4Align<SReg_128>:$srsrc, i1imm:$slc, i1imm:$tfe, SSrc_32:$soffset),
++  asm,
++  []> {
++  let mayStore = 1;
++  let mayLoad = 0;
++}
 +
-+  def _e32 : VOPC_Helper <op, VReg_64, AllReg_64, opName, pattern>;
++class MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass> : MUBUF <
++  op,
++  (outs regClass:$dst),
++  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
++       i1imm:$lds, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc, i1imm:$slc,
++       i1imm:$tfe, SSrc_32:$soffset),
++  asm,
++  []> {
++  let mayLoad = 1;
++  let mayStore = 0;
++}
 +
-+  def _e64 : VOP3_64 <
-+    {0, op{7}, op{6}, op{5}, op{4}, op{3}, op{2}, op{1}, op{0}},
-+    opName, []
-+  >;
++class MTBUF_Load_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
++  op,
++  (outs regClass:$dst),
++  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
++       i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc,
++       i1imm:$slc, i1imm:$tfe, SSrc_32:$soffset),
++  asm,
++  []> {
++  let mayLoad = 1;
++  let mayStore = 0;
 +}
 +
-+class SOPC_32 <bits<7> op, string opName, list<dag> pattern>
-+  : SOPC <op, (outs SCCReg:$dst), (ins SReg_32:$src0, SReg_32:$src1), opName, pattern>;
++multiclass SMRD_Helper <bits<5> op, string asm, RegisterClass dstClass> {
++  def _IMM : SMRD <
++             op, 1,
++             (outs dstClass:$dst),
++             (ins GPR2Align<SReg_64>:$sbase, i32imm:$offset),
++             asm,
++             []
++  >;
 +
-+class SOPC_64 <bits<7> op, string opName, list<dag> pattern>
-+  : SOPC <op, (outs SCCReg:$dst), (ins SReg_64:$src0, SReg_64:$src1), opName, pattern>;
++  def _SGPR : SMRD <
++              op, 0,
++              (outs dstClass:$dst),
++              (ins GPR2Align<SReg_64>:$sbase, SReg_32:$soff),
++              asm,
++              []
++  >;
++}
 +
 diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
 new file mode 100644
-index 0000000..adcffa8
+index 0000000..1c4b3cf
 --- /dev/null
 +++ b/lib/Target/R600/SIInstrInfo.cpp
-@@ -0,0 +1,90 @@
+@@ -0,0 +1,143 @@
 +//===-- SIInstrInfo.cpp - SI Instruction Information  ---------------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -18839,7 +20452,15 @@ index 0000000..adcffa8
 +  // never be necessary.
 +  assert(DestReg != AMDGPU::SCC && SrcReg != AMDGPU::SCC);
 +
-+  if (AMDGPU::SReg_64RegClass.contains(DestReg)) {
++  if (AMDGPU::VReg_64RegClass.contains(DestReg)) {
++    assert(AMDGPU::VReg_64RegClass.contains(SrcReg) ||
++	   AMDGPU::SReg_64RegClass.contains(SrcReg));
++    BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), RI.getSubReg(DestReg, AMDGPU::sub0))
++            .addReg(RI.getSubReg(SrcReg, AMDGPU::sub0), getKillRegState(KillSrc))
++            .addReg(DestReg, RegState::Define | RegState::Implicit);
++    BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), RI.getSubReg(DestReg, AMDGPU::sub1))
++            .addReg(RI.getSubReg(SrcReg, AMDGPU::sub1), getKillRegState(KillSrc));
++  } else if (AMDGPU::SReg_64RegClass.contains(DestReg)) {
 +    assert(AMDGPU::SReg_64RegClass.contains(SrcReg));
 +    BuildMI(MBB, MI, DL, get(AMDGPU::S_MOV_B64), DestReg)
 +            .addReg(SrcReg, getKillRegState(KillSrc));
@@ -18858,8 +20479,8 @@ index 0000000..adcffa8
 +
 +MachineInstr * SIInstrInfo::getMovImmInstr(MachineFunction *MF, unsigned DstReg,
 +                                           int64_t Imm) const {
-+  MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::V_MOV_IMM_I32), DebugLoc());
-+  MachineInstrBuilder MIB(*MF, MI);
++  MachineInstr * MI = MF->CreateMachineInstr(get(AMDGPU::V_MOV_B32_e32), DebugLoc());
++  MachineInstrBuilder MIB(MI);
 +  MIB.addReg(DstReg, RegState::Define);
 +  MIB.addImm(Imm);
 +
@@ -18874,9 +20495,6 @@ index 0000000..adcffa8
 +  case AMDGPU::S_MOV_B64:
 +  case AMDGPU::V_MOV_B32_e32:
 +  case AMDGPU::V_MOV_B32_e64:
-+  case AMDGPU::V_MOV_IMM_F32:
-+  case AMDGPU::V_MOV_IMM_I32:
-+  case AMDGPU::S_MOV_IMM_I32:
 +    return true;
 +  }
 +}
@@ -18885,12 +20503,60 @@ index 0000000..adcffa8
 +SIInstrInfo::isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const {
 +  return RC != &AMDGPU::EXECRegRegClass;
 +}
++
++//===----------------------------------------------------------------------===//
++// Indirect addressing callbacks
++//===----------------------------------------------------------------------===//
++
++unsigned SIInstrInfo::calculateIndirectAddress(unsigned RegIndex,
++                                                 unsigned Channel) const {
++  assert(Channel == 0);
++  return RegIndex;
++}
++
++
++int SIInstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const {
++  llvm_unreachable("Unimplemented");
++}
++
++int SIInstrInfo::getIndirectIndexEnd(const MachineFunction &MF) const {
++  llvm_unreachable("Unimplemented");
++}
++
++const TargetRegisterClass *SIInstrInfo::getIndirectAddrStoreRegClass(
++                                                     unsigned SourceReg) const {
++  llvm_unreachable("Unimplemented");
++}
++
++const TargetRegisterClass *SIInstrInfo::getIndirectAddrLoadRegClass() const {
++  llvm_unreachable("Unimplemented");
++}
++
++MachineInstrBuilder SIInstrInfo::buildIndirectWrite(
++                                   MachineBasicBlock *MBB,
++                                   MachineBasicBlock::iterator I,
++                                   unsigned ValueReg,
++                                   unsigned Address, unsigned OffsetReg) const {
++  llvm_unreachable("Unimplemented");
++}
++
++MachineInstrBuilder SIInstrInfo::buildIndirectRead(
++                                   MachineBasicBlock *MBB,
++                                   MachineBasicBlock::iterator I,
++                                   unsigned ValueReg,
++                                   unsigned Address, unsigned OffsetReg) const {
++  llvm_unreachable("Unimplemented");
++}
++
++const TargetRegisterClass *SIInstrInfo::getSuperIndirectRegClass() const {
++  llvm_unreachable("Unimplemented");
++}
 diff --git a/lib/Target/R600/SIInstrInfo.h b/lib/Target/R600/SIInstrInfo.h
 new file mode 100644
-index 0000000..631f6c0
+index 0000000..a65f7b6
 --- /dev/null
 +++ b/lib/Target/R600/SIInstrInfo.h
-@@ -0,0 +1,62 @@
+@@ -0,0 +1,84 @@
 +//===-- SIInstrInfo.h - SI Instruction Info Interface ---------------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -18928,12 +20594,6 @@ index 0000000..631f6c0
 +                           unsigned DestReg, unsigned SrcReg,
 +                           bool KillSrc) const;
 +
-+  /// \returns the encoding type of this instruction.
-+  unsigned getEncodingType(const MachineInstr &MI) const;
-+
-+  /// \returns the size of this instructions encoding in number of bytes.
-+  unsigned getEncodingBytes(const MachineInstr &MI) const;
-+
 +  virtual MachineInstr * getMovImmInstr(MachineFunction *MF, unsigned DstReg,
 +                                        int64_t Imm) const;
 +
@@ -18941,6 +20601,32 @@ index 0000000..631f6c0
 +  virtual bool isMov(unsigned Opcode) const;
 +
 +  virtual bool isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const;
++
++  virtual int getIndirectIndexBegin(const MachineFunction &MF) const;
++
++  virtual int getIndirectIndexEnd(const MachineFunction &MF) const;
++
++  virtual unsigned calculateIndirectAddress(unsigned RegIndex,
++                                            unsigned Channel) const;
++
++  virtual const TargetRegisterClass *getIndirectAddrStoreRegClass(
++                                                      unsigned SourceReg) const;
++
++  virtual const TargetRegisterClass *getIndirectAddrLoadRegClass() const;
++
++  virtual MachineInstrBuilder buildIndirectWrite(MachineBasicBlock *MBB,
++                                                 MachineBasicBlock::iterator I,
++                                                 unsigned ValueReg,
++                                                 unsigned Address,
++                                                 unsigned OffsetReg) const;
++
++  virtual MachineInstrBuilder buildIndirectRead(MachineBasicBlock *MBB,
++                                                MachineBasicBlock::iterator I,
++                                                unsigned ValueReg,
++                                                unsigned Address,
++                                                unsigned OffsetReg) const;
++
++  virtual const TargetRegisterClass *getSuperIndirectRegClass() const;
 +  };
 +
 +} // End namespace llvm
@@ -18948,17 +20634,19 @@ index 0000000..631f6c0
 +namespace SIInstrFlags {
 +  enum Flags {
 +    // First 4 bits are the instruction encoding
-+    NEED_WAIT = 1 << 4
++    VM_CNT = 1 << 0,
++    EXP_CNT = 1 << 1,
++    LGKM_CNT = 1 << 2
 +  };
 +}
 +
 +#endif //SIINSTRINFO_H
 diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
 new file mode 100644
-index 0000000..873a451
+index 0000000..8c4e5af
 --- /dev/null
 +++ b/lib/Target/R600/SIInstrInfo.td
-@@ -0,0 +1,589 @@
+@@ -0,0 +1,465 @@
 +//===-- SIInstrInfo.td - SI Instruction Encodings ---------*- tablegen -*--===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -18969,57 +20657,66 @@ index 0000000..873a451
 +//===----------------------------------------------------------------------===//
 +
 +//===----------------------------------------------------------------------===//
-+// SI DAG Profiles
-+//===----------------------------------------------------------------------===//
-+def SDTVCCBinaryOp : SDTypeProfile<1, 2, [
-+  SDTCisInt<0>, SDTCisInt<1>, SDTCisSameAs<1, 2>
-+]>;
-+
-+//===----------------------------------------------------------------------===//
 +// SI DAG Nodes
 +//===----------------------------------------------------------------------===//
 +
-+// and operation on 64-bit wide vcc
-+def SIsreg1_and : SDNode<"SIISD::VCC_AND", SDTVCCBinaryOp,
-+  [SDNPCommutative, SDNPAssociative]
++// SMRD takes a 64bit memory address and can only add an 32bit offset
++def SIadd64bit32bit : SDNode<"ISD::ADD",
++  SDTypeProfile<1, 2, [SDTCisSameAs<0, 1>, SDTCisVT<0, i64>, SDTCisVT<2, i32>]>
 +>;
 +
-+// Special bitcast node for sharing VCC register between VALU and SALU
-+def SIsreg1_bitcast : SDNode<"SIISD::VCC_BITCAST",
-+  SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisInt<1>]>
-+>;
++// Transformation function, extract the lower 32bit of a 64bit immediate
++def LO32 : SDNodeXForm<imm, [{
++  return CurDAG->getTargetConstant(N->getZExtValue() & 0xffffffff, MVT::i32);
++}]>;
 +
-+// and operation on 64-bit wide vcc
-+def SIvcc_and : SDNode<"SIISD::VCC_AND", SDTVCCBinaryOp,
-+  [SDNPCommutative, SDNPAssociative]
++// Transformation function, extract the upper 32bit of a 64bit immediate
++def HI32 : SDNodeXForm<imm, [{
++  return CurDAG->getTargetConstant(N->getZExtValue() >> 32, MVT::i32);
++}]>;
++
++def IMM8bitDWORD : ImmLeaf <
++  i32, [{
++    return (Imm & ~0x3FC) == 0;
++  }], SDNodeXForm<imm, [{
++    return CurDAG->getTargetConstant(
++      N->getZExtValue() >> 2, MVT::i32);
++  }]>
 +>;
 +
-+// Special bitcast node for sharing VCC register between VALU and SALU
-+def SIvcc_bitcast : SDNode<"SIISD::VCC_BITCAST",
-+  SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisInt<1>]>
++def IMM12bit : ImmLeaf <
++  i16,
++  [{return isUInt<12>(Imm);}]
 +>;
 +
++class InlineImm <ValueType vt> : ImmLeaf <vt, [{
++  return -16 <= Imm && Imm <= 64;
++}]>;
++
 +class InstSI <dag outs, dag ins, string asm, list<dag> pattern> :
 +    AMDGPUInst<outs, ins, asm, pattern> {
 +
-+  field bits<4> EncodingType = 0;
-+  field bits<1> NeedWait = 0;
-+
-+  let TSFlags{3-0} = EncodingType;
-+  let TSFlags{4} = NeedWait;
++  field bits<1> VM_CNT = 0;
++  field bits<1> EXP_CNT = 0;
++  field bits<1> LGKM_CNT = 0;
 +
++  let TSFlags{0} = VM_CNT;
++  let TSFlags{1} = EXP_CNT;
++  let TSFlags{2} = LGKM_CNT;
 +}
 +
 +class Enc32 <dag outs, dag ins, string asm, list<dag> pattern> :
 +    InstSI <outs, ins, asm, pattern> {
 +
 +  field bits<32> Inst;
++  let Size = 4;
 +}
 +
 +class Enc64 <dag outs, dag ins, string asm, list<dag> pattern> :
 +    InstSI <outs, ins, asm, pattern> {
 +
 +  field bits<64> Inst;
++  let Size = 8;
 +}
 +
 +class SIOperand <ValueType vt, dag opInfo>: Operand <vt> {
@@ -19027,49 +20724,16 @@ index 0000000..873a451
 +  let MIOperandInfo = opInfo;
 +}
 +
-+def IMM16bit : ImmLeaf <
-+  i16,
-+  [{return isInt<16>(Imm);}]
-+>;
-+
-+def IMM8bit : ImmLeaf <
-+  i32,
-+  [{return (int32_t)Imm >= 0 && (int32_t)Imm <= 0xff;}]
-+>;
-+
-+def IMM12bit : ImmLeaf <
-+  i16,
-+  [{return (int16_t)Imm >= 0 && (int16_t)Imm <= 0xfff;}]
-+>;
-+
-+def IMM32bitIn64bit : ImmLeaf <
-+  i64,
-+  [{return isInt<32>(Imm);}]
-+>;
-+
 +class GPR4Align <RegisterClass rc> : Operand <vAny> {
 +  let EncoderMethod = "GPR4AlignEncode";
 +  let MIOperandInfo = (ops rc:$reg); 
 +}
 +
-+class GPR2Align <RegisterClass rc, ValueType vt> : Operand <vt> {
++class GPR2Align <RegisterClass rc> : Operand <iPTR> {
 +  let EncoderMethod = "GPR2AlignEncode";
 +  let MIOperandInfo = (ops rc:$reg);
 +}
 +
-+def SMRDmemrr : Operand<iPTR> {
-+  let MIOperandInfo = (ops SReg_64, SReg_32);
-+  let EncoderMethod = "GPR2AlignEncode";
-+}
-+
-+def SMRDmemri : Operand<iPTR> {
-+  let MIOperandInfo = (ops SReg_64, i32imm);
-+  let EncoderMethod = "SMRDmemriEncode";
-+}
-+
-+def ADDR_Reg     : ComplexPattern<i64, 2, "SelectADDRReg", [], []>;
-+def ADDR_Offset8 : ComplexPattern<i64, 2, "SelectADDR8BitOffset", [], []>;
-+
 +let Uses = [EXEC] in {
 +
 +def EXP : Enc64<
@@ -19099,10 +20763,8 @@ index 0000000..873a451
 +  let Inst{47-40} = VSRC1;
 +  let Inst{55-48} = VSRC2;
 +  let Inst{63-56} = VSRC3;
-+  let EncodingType = 0; //SIInstrEncodingType::EXP
 +
-+  let NeedWait = 1;
-+  let usesCustomInserter = 1;
++  let EXP_CNT = 1;
 +}
 +
 +class MIMG <bits<7> op, dag outs, dag ins, string asm, list<dag> pattern> :
@@ -19136,10 +20798,8 @@ index 0000000..873a451
 +  let Inst{52-48} = SRSRC;
 +  let Inst{57-53} = SSAMP;
 +
-+  let EncodingType = 2; //SIInstrEncodingType::MIMG
-+
-+  let NeedWait = 1;
-+  let usesCustomInserter = 1;
++  let VM_CNT = 1;
++  let EXP_CNT = 1;
 +}
 +
 +class MTBUF <bits<3> op, dag outs, dag ins, string asm, list<dag> pattern> :
@@ -19174,10 +20834,10 @@ index 0000000..873a451
 +  let Inst{54} = SLC;
 +  let Inst{55} = TFE;
 +  let Inst{63-56} = SOFFSET;
-+  let EncodingType = 3; //SIInstrEncodingType::MTBUF
 +
-+  let NeedWait = 1;
-+  let usesCustomInserter = 1;
++  let VM_CNT = 1;
++  let EXP_CNT = 1;
++
 +  let neverHasSideEffects = 1;
 +}
 +
@@ -19211,34 +20871,30 @@ index 0000000..873a451
 +  let Inst{54} = SLC;
 +  let Inst{55} = TFE;
 +  let Inst{63-56} = SOFFSET;
-+  let EncodingType = 4; //SIInstrEncodingType::MUBUF
 +
-+  let NeedWait = 1;
-+  let usesCustomInserter = 1;
++  let VM_CNT = 1;
++  let EXP_CNT = 1;
++
 +  let neverHasSideEffects = 1;
 +}
 +
 +} // End Uses = [EXEC]
 +
-+class SMRD <bits<5> op, dag outs, dag ins, string asm, list<dag> pattern> :
-+    Enc32<outs, ins, asm, pattern> {
++class SMRD <bits<5> op, bits<1> imm, dag outs, dag ins, string asm,
++            list<dag> pattern> : Enc32<outs, ins, asm, pattern> {
 +
 +  bits<7> SDST;
-+  bits<15> PTR;
-+  bits<8> OFFSET = PTR{7-0};
-+  bits<1> IMM    = PTR{8};
-+  bits<6> SBASE  = PTR{14-9};
++  bits<6> SBASE;
++  bits<8> OFFSET;
 +  
 +  let Inst{7-0} = OFFSET;
-+  let Inst{8} = IMM;
++  let Inst{8} = imm;
 +  let Inst{14-9} = SBASE;
 +  let Inst{21-15} = SDST;
 +  let Inst{26-22} = op;
 +  let Inst{31-27} = 0x18; //encoding
-+  let EncodingType = 5; //SIInstrEncodingType::SMRD
 +
-+  let NeedWait = 1;
-+  let usesCustomInserter = 1;
++  let LGKM_CNT = 1;
 +}
 +
 +class SOP1 <bits<8> op, dag outs, dag ins, string asm, list<dag> pattern> :
@@ -19251,7 +20907,6 @@ index 0000000..873a451
 +  let Inst{15-8} = op;
 +  let Inst{22-16} = SDST;
 +  let Inst{31-23} = 0x17d; //encoding;
-+  let EncodingType = 6; //SIInstrEncodingType::SOP1
 +
 +  let mayLoad = 0;
 +  let mayStore = 0;
@@ -19270,7 +20925,6 @@ index 0000000..873a451
 +  let Inst{22-16} = SDST;
 +  let Inst{29-23} = op;
 +  let Inst{31-30} = 0x2; // encoding
-+  let EncodingType = 7; // SIInstrEncodingType::SOP2  
 +
 +  let mayLoad = 0;
 +  let mayStore = 0;
@@ -19287,7 +20941,6 @@ index 0000000..873a451
 +  let Inst{15-8} = SSRC1;
 +  let Inst{22-16} = op;
 +  let Inst{31-23} = 0x17e;
-+  let EncodingType = 8; // SIInstrEncodingType::SOPC
 +
 +  let DisableEncoding = "$dst";
 +  let mayLoad = 0;
@@ -19305,7 +20958,6 @@ index 0000000..873a451
 +  let Inst{22-16} = SDST;
 +  let Inst{27-23} = op;
 +  let Inst{31-28} = 0xb; //encoding
-+  let EncodingType = 9; // SIInstrEncodingType::SOPK
 +
 +  let mayLoad = 0;
 +  let mayStore = 0;
@@ -19323,7 +20975,6 @@ index 0000000..873a451
 +  let Inst{15-0} = SIMM16;
 +  let Inst{22-16} = op;
 +  let Inst{31-23} = 0x17f; // encoding
-+  let EncodingType = 10; // SIInstrEncodingType::SOPP
 +
 +  let mayLoad = 0;
 +  let mayStore = 0;
@@ -19346,7 +20997,6 @@ index 0000000..873a451
 +  let Inst{17-16} = op;
 +  let Inst{25-18} = VDST;
 +  let Inst{31-26} = 0x32; // encoding
-+  let EncodingType = 11; // SIInstrEncodingType::VINTRP
 +
 +  let neverHasSideEffects = 1;
 +  let mayLoad = 1;
@@ -19364,9 +21014,6 @@ index 0000000..873a451
 +  let Inst{24-17} = VDST;
 +  let Inst{31-25} = 0x3f; //encoding
 +  
-+  let EncodingType = 12; // SIInstrEncodingType::VOP1
-+  let PostEncoderMethod = "VOPPostEncode";
-+
 +  let mayLoad = 0;
 +  let mayStore = 0;
 +  let hasSideEffects = 0;
@@ -19385,9 +21032,6 @@ index 0000000..873a451
 +  let Inst{30-25} = op;
 +  let Inst{31} = 0x0; //encoding
 +  
-+  let EncodingType = 13; // SIInstrEncodingType::VOP2
-+  let PostEncoderMethod = "VOPPostEncode";
-+
 +  let mayLoad = 0;
 +  let mayStore = 0;
 +  let hasSideEffects = 0;
@@ -19416,9 +21060,6 @@ index 0000000..873a451
 +  let Inst{60-59} = OMOD;
 +  let Inst{63-61} = NEG;
 +  
-+  let EncodingType = 14; // SIInstrEncodingType::VOP3
-+  let PostEncoderMethod = "VOPPostEncode";
-+
 +  let mayLoad = 0;
 +  let mayStore = 0;
 +  let hasSideEffects = 0;
@@ -19433,127 +21074,50 @@ index 0000000..873a451
 +  bits<9> SRC2;
 +  bits<7> SDST;
 +  bits<2> OMOD;
-+  bits<3> NEG;
-+
-+  let Inst{7-0} = VDST;
-+  let Inst{14-8} = SDST;
-+  let Inst{25-17} = op;
-+  let Inst{31-26} = 0x34; //encoding
-+  let Inst{40-32} = SRC0;
-+  let Inst{49-41} = SRC1;
-+  let Inst{58-50} = SRC2;
-+  let Inst{60-59} = OMOD;
-+  let Inst{63-61} = NEG;
-+
-+  let EncodingType = 14; // SIInstrEncodingType::VOP3
-+  let PostEncoderMethod = "VOPPostEncode";
-+
-+  let mayLoad = 0;
-+  let mayStore = 0;
-+  let hasSideEffects = 0;
-+}
-+
-+class VOPC <bits<8> op, dag ins, string asm, list<dag> pattern> :
-+    Enc32 <(outs VCCReg:$dst), ins, asm, pattern> {
-+
-+  bits<9> SRC0;
-+  bits<8> VSRC1;
-+
-+  let Inst{8-0} = SRC0;
-+  let Inst{16-9} = VSRC1;
-+  let Inst{24-17} = op;
-+  let Inst{31-25} = 0x3e;
-+ 
-+  let EncodingType = 15; //SIInstrEncodingType::VOPC
-+  let PostEncoderMethod = "VOPPostEncode";
-+  let DisableEncoding = "$dst";
-+  let mayLoad = 0;
-+  let mayStore = 0;
-+  let hasSideEffects = 0;
-+}
-+
-+} // End Uses = [EXEC]
-+
-+class MIMG_Load_Helper <bits<7> op, string asm> : MIMG <
-+  op,
-+  (outs VReg_128:$vdata),
-+  (ins i32imm:$dmask, i1imm:$unorm, i1imm:$glc, i1imm:$da, i1imm:$r128,
-+       i1imm:$tfe, i1imm:$lwe, i1imm:$slc, VReg_128:$vaddr,
-+       GPR4Align<SReg_256>:$srsrc, GPR4Align<SReg_128>:$ssamp),
-+  asm,
-+  []> {
-+  let mayLoad = 1;
-+  let mayStore = 0;
-+}
-+
-+class MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass> : MUBUF <
-+  op,
-+  (outs regClass:$dst),
-+  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
-+       i1imm:$lds, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc, i1imm:$slc,
-+       i1imm:$tfe, SReg_32:$soffset),
-+  asm,
-+  []> {
-+  let mayLoad = 1;
-+  let mayStore = 0;
-+}
++  bits<3> NEG;
 +
-+class MTBUF_Load_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
-+  op,
-+  (outs regClass:$dst),
-+  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
-+       i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr, GPR4Align<SReg_128>:$srsrc,
-+       i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
-+  asm,
-+  []> {
-+  let mayLoad = 1;
-+  let mayStore = 0;
-+}
++  let Inst{7-0} = VDST;
++  let Inst{14-8} = SDST;
++  let Inst{25-17} = op;
++  let Inst{31-26} = 0x34; //encoding
++  let Inst{40-32} = SRC0;
++  let Inst{49-41} = SRC1;
++  let Inst{58-50} = SRC2;
++  let Inst{60-59} = OMOD;
++  let Inst{63-61} = NEG;
 +
-+class MTBUF_Store_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
-+  op,
-+  (outs),
-+  (ins regClass:$vdata, i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc,
-+   i1imm:$addr64, i8imm:$dfmt, i8imm:$nfmt, VReg_32:$vaddr,
-+   GPR4Align<SReg_128>:$srsrc, i1imm:$slc, i1imm:$tfe, SReg_32:$soffset),
-+  asm,
-+  []> {
-+  let mayStore = 1;
 +  let mayLoad = 0;
++  let mayStore = 0;
++  let hasSideEffects = 0;
 +}
 +
-+multiclass SMRD_Helper <bits<5> op, string asm, RegisterClass dstClass,
-+                        ValueType vt> {
-+  def _IMM : SMRD <
-+              op,
-+              (outs dstClass:$dst),
-+              (ins SMRDmemri:$src0),
-+              asm,
-+              [(set (vt dstClass:$dst), (constant_load ADDR_Offset8:$src0))]
-+  >;
++class VOPC <bits<8> op, dag ins, string asm, list<dag> pattern> :
++    Enc32 <(outs VCCReg:$dst), ins, asm, pattern> {
 +
-+  def _SGPR : SMRD <
-+              op,
-+              (outs dstClass:$dst),
-+              (ins SMRDmemrr:$src0),
-+              asm,
-+              [(set (vt dstClass:$dst), (constant_load ADDR_Reg:$src0))]
-+  >;
-+}
++  bits<9> SRC0;
++  bits<8> VSRC1;
 +
-+multiclass SMRD_32 <bits<5> op, string asm, RegisterClass dstClass> {
-+  defm _F32 : SMRD_Helper <op, asm, dstClass, f32>;
-+  defm _I32 : SMRD_Helper <op, asm, dstClass, i32>;
++  let Inst{8-0} = SRC0;
++  let Inst{16-9} = VSRC1;
++  let Inst{24-17} = op;
++  let Inst{31-25} = 0x3e;
++ 
++  let DisableEncoding = "$dst";
++  let mayLoad = 0;
++  let mayStore = 0;
++  let hasSideEffects = 0;
 +}
 +
++} // End Uses = [EXEC]
++
 +include "SIInstrFormats.td"
 +include "SIInstructions.td"
 diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
 new file mode 100644
-index 0000000..005be96
+index 0000000..3a9822a
 --- /dev/null
 +++ b/lib/Target/R600/SIInstructions.td
-@@ -0,0 +1,1351 @@
+@@ -0,0 +1,1462 @@
 +//===-- SIInstructions.td - SI Instruction Defintions ---------------------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -19567,6 +21131,17 @@ index 0000000..005be96
 +// that are not yet supported remain commented out.
 +//===----------------------------------------------------------------------===//
 +
++class InterpSlots {
++int P0 = 2;
++int P10 = 0;
++int P20 = 1;
++}
++def INTERP : InterpSlots;
++
++def InterpSlot : Operand<i32> {
++  let PrintMethod = "printInterpSlot";
++}
++
 +def isSI : Predicate<"Subtarget.device()"
 +                            "->getGeneration() == AMDGPUDeviceInfo::HD7XXX">;
 +
@@ -19675,33 +21250,33 @@ index 0000000..005be96
 +defm V_CMP_F_F32 : VOPC_32 <0x00000000, "V_CMP_F_F32", []>;
 +defm V_CMP_LT_F32 : VOPC_32 <0x00000001, "V_CMP_LT_F32", []>;
 +def : Pat <
-+  (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_LT)),
-+  (V_CMP_LT_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LT)),
++  (V_CMP_LT_F32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_EQ_F32 : VOPC_32 <0x00000002, "V_CMP_EQ_F32", []>;
 +def : Pat <
-+  (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_EQ)),
-+  (V_CMP_EQ_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_EQ)),
++  (V_CMP_EQ_F32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_LE_F32 : VOPC_32 <0x00000003, "V_CMP_LE_F32", []>;
 +def : Pat <
-+  (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_LE)),
-+  (V_CMP_LE_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_LE)),
++  (V_CMP_LE_F32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_GT_F32 : VOPC_32 <0x00000004, "V_CMP_GT_F32", []>;
 +def : Pat <
-+  (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_GT)),
-+  (V_CMP_GT_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GT)),
++  (V_CMP_GT_F32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_LG_F32 : VOPC_32 <0x00000005, "V_CMP_LG_F32", []>;
 +def : Pat <
-+  (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_NE)),
-+  (V_CMP_LG_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)),
++  (V_CMP_LG_F32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_GE_F32 : VOPC_32 <0x00000006, "V_CMP_GE_F32", []>;
 +def : Pat <
-+  (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_GE)),
-+  (V_CMP_GE_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_GE)),
++  (V_CMP_GE_F32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_O_F32 : VOPC_32 <0x00000007, "V_CMP_O_F32", []>;
 +defm V_CMP_U_F32 : VOPC_32 <0x00000008, "V_CMP_U_F32", []>;
@@ -19711,8 +21286,8 @@ index 0000000..005be96
 +defm V_CMP_NLE_F32 : VOPC_32 <0x0000000c, "V_CMP_NLE_F32", []>;
 +defm V_CMP_NEQ_F32 : VOPC_32 <0x0000000d, "V_CMP_NEQ_F32", []>;
 +def : Pat <
-+  (i1 (setcc (f32 AllReg_32:$src0), VReg_32:$src1, COND_NE)),
-+  (V_CMP_NEQ_F32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (f32 VSrc_32:$src0), VReg_32:$src1, COND_NE)),
++  (V_CMP_NEQ_F32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_NLT_F32 : VOPC_32 <0x0000000e, "V_CMP_NLT_F32", []>;
 +defm V_CMP_TRU_F32 : VOPC_32 <0x0000000f, "V_CMP_TRU_F32", []>;
@@ -19845,33 +21420,33 @@ index 0000000..005be96
 +defm V_CMP_F_I32 : VOPC_32 <0x00000080, "V_CMP_F_I32", []>;
 +defm V_CMP_LT_I32 : VOPC_32 <0x00000081, "V_CMP_LT_I32", []>;
 +def : Pat <
-+  (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_LT)),
-+  (V_CMP_LT_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_LT)),
++  (V_CMP_LT_I32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_EQ_I32 : VOPC_32 <0x00000082, "V_CMP_EQ_I32", []>;
 +def : Pat <
-+  (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_EQ)),
-+  (V_CMP_EQ_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_EQ)),
++  (V_CMP_EQ_I32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_LE_I32 : VOPC_32 <0x00000083, "V_CMP_LE_I32", []>;
 +def : Pat <
-+  (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_LE)),
-+  (V_CMP_LE_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_LE)),
++  (V_CMP_LE_I32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_GT_I32 : VOPC_32 <0x00000084, "V_CMP_GT_I32", []>;
 +def : Pat <
-+  (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_GT)),
-+  (V_CMP_GT_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_GT)),
++  (V_CMP_GT_I32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_NE_I32 : VOPC_32 <0x00000085, "V_CMP_NE_I32", []>;
 +def : Pat <
-+  (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_NE)),
-+  (V_CMP_NE_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_NE)),
++  (V_CMP_NE_I32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_GE_I32 : VOPC_32 <0x00000086, "V_CMP_GE_I32", []>;
 +def : Pat <
-+  (i1 (setcc (i32 AllReg_32:$src0), VReg_32:$src1, COND_GE)),
-+  (V_CMP_GE_I32_e64 AllReg_32:$src0, VReg_32:$src1)
++  (i1 (setcc (i32 VSrc_32:$src0), VReg_32:$src1, COND_GE)),
++  (V_CMP_GE_I32_e64 VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_CMP_T_I32 : VOPC_32 <0x00000087, "V_CMP_T_I32", []>;
 +
@@ -20017,11 +21592,13 @@ index 0000000..005be96
 +//def TBUFFER_STORE_FORMAT_XYZ : MTBUF_ <0x00000006, "TBUFFER_STORE_FORMAT_XYZ", []>;
 +//def TBUFFER_STORE_FORMAT_XYZW : MTBUF_ <0x00000007, "TBUFFER_STORE_FORMAT_XYZW", []>;
 +
-+defm S_LOAD_DWORD : SMRD_32 <0x00000000, "S_LOAD_DWORD", SReg_32>;
++let mayLoad = 1 in {
++
++defm S_LOAD_DWORD : SMRD_Helper <0x00000000, "S_LOAD_DWORD", SReg_32>;
 +
 +//def S_LOAD_DWORDX2 : SMRD_DWORDX2 <0x00000001, "S_LOAD_DWORDX2", []>;
-+defm S_LOAD_DWORDX4 : SMRD_Helper <0x00000002, "S_LOAD_DWORDX4", SReg_128, v4i32>;
-+defm S_LOAD_DWORDX8 : SMRD_Helper <0x00000003, "S_LOAD_DWORDX8", SReg_256, v8i32>;
++defm S_LOAD_DWORDX4 : SMRD_Helper <0x00000002, "S_LOAD_DWORDX4", SReg_128>;
++defm S_LOAD_DWORDX8 : SMRD_Helper <0x00000003, "S_LOAD_DWORDX8", SReg_256>;
 +//def S_LOAD_DWORDX16 : SMRD_DWORDX16 <0x00000004, "S_LOAD_DWORDX16", []>;
 +//def S_BUFFER_LOAD_DWORD : SMRD_ <0x00000008, "S_BUFFER_LOAD_DWORD", []>;
 +//def S_BUFFER_LOAD_DWORDX2 : SMRD_DWORDX2 <0x00000009, "S_BUFFER_LOAD_DWORDX2", []>;
@@ -20029,6 +21606,8 @@ index 0000000..005be96
 +//def S_BUFFER_LOAD_DWORDX8 : SMRD_DWORDX8 <0x0000000b, "S_BUFFER_LOAD_DWORDX8", []>;
 +//def S_BUFFER_LOAD_DWORDX16 : SMRD_DWORDX16 <0x0000000c, "S_BUFFER_LOAD_DWORDX16", []>;
 +
++} // mayLoad = 1
++
 +//def S_MEMTIME : SMRD_ <0x0000001e, "S_MEMTIME", []>;
 +//def S_DCACHE_INV : SMRD_ <0x0000001f, "S_DCACHE_INV", []>;
 +//def IMAGE_LOAD : MIMG_NoPattern_ <"IMAGE_LOAD", 0x00000000>;
@@ -20067,12 +21646,12 @@ index 0000000..005be96
 +def IMAGE_SAMPLE_B : MIMG_Load_Helper <0x00000025, "IMAGE_SAMPLE_B">;
 +//def IMAGE_SAMPLE_B_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_B_CL", 0x00000026>;
 +//def IMAGE_SAMPLE_LZ : MIMG_NoPattern_ <"IMAGE_SAMPLE_LZ", 0x00000027>;
-+//def IMAGE_SAMPLE_C : MIMG_NoPattern_ <"IMAGE_SAMPLE_C", 0x00000028>;
++def IMAGE_SAMPLE_C : MIMG_Load_Helper <0x00000028, "IMAGE_SAMPLE_C">;
 +//def IMAGE_SAMPLE_C_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_CL", 0x00000029>;
 +//def IMAGE_SAMPLE_C_D : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_D", 0x0000002a>;
 +//def IMAGE_SAMPLE_C_D_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_D_CL", 0x0000002b>;
-+//def IMAGE_SAMPLE_C_L : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_L", 0x0000002c>;
-+//def IMAGE_SAMPLE_C_B : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_B", 0x0000002d>;
++def IMAGE_SAMPLE_C_L : MIMG_Load_Helper <0x0000002c, "IMAGE_SAMPLE_C_L">;
++def IMAGE_SAMPLE_C_B : MIMG_Load_Helper <0x0000002d, "IMAGE_SAMPLE_C_B">;
 +//def IMAGE_SAMPLE_C_B_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_B_CL", 0x0000002e>;
 +//def IMAGE_SAMPLE_C_LZ : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_LZ", 0x0000002f>;
 +//def IMAGE_SAMPLE_O : MIMG_NoPattern_ <"IMAGE_SAMPLE_O", 0x00000030>;
@@ -20135,12 +21714,12 @@ index 0000000..005be96
 +//defm V_CVT_I32_F64 : VOP1_32 <0x00000003, "V_CVT_I32_F64", []>;
 +//defm V_CVT_F64_I32 : VOP1_64 <0x00000004, "V_CVT_F64_I32", []>;
 +defm V_CVT_F32_I32 : VOP1_32 <0x00000005, "V_CVT_F32_I32",
-+  [(set VReg_32:$dst, (sint_to_fp AllReg_32:$src0))]
++  [(set VReg_32:$dst, (sint_to_fp VSrc_32:$src0))]
 +>;
 +//defm V_CVT_F32_U32 : VOP1_32 <0x00000006, "V_CVT_F32_U32", []>;
 +//defm V_CVT_U32_F32 : VOP1_32 <0x00000007, "V_CVT_U32_F32", []>;
 +defm V_CVT_I32_F32 : VOP1_32 <0x00000008, "V_CVT_I32_F32",
-+  [(set VReg_32:$dst, (fp_to_sint AllReg_32:$src0))]
++  [(set (i32 VReg_32:$dst), (fp_to_sint VSrc_32:$src0))]
 +>;
 +defm V_MOV_FED_B32 : VOP1_32 <0x00000009, "V_MOV_FED_B32", []>;
 +////def V_CVT_F16_F32 : VOP1_F16 <0x0000000a, "V_CVT_F16_F32", []>;
@@ -20157,31 +21736,35 @@ index 0000000..005be96
 +//defm V_CVT_U32_F64 : VOP1_32 <0x00000015, "V_CVT_U32_F64", []>;
 +//defm V_CVT_F64_U32 : VOP1_64 <0x00000016, "V_CVT_F64_U32", []>;
 +defm V_FRACT_F32 : VOP1_32 <0x00000020, "V_FRACT_F32",
-+  [(set VReg_32:$dst, (AMDGPUfract AllReg_32:$src0))]
++  [(set VReg_32:$dst, (AMDGPUfract VSrc_32:$src0))]
 +>;
 +defm V_TRUNC_F32 : VOP1_32 <0x00000021, "V_TRUNC_F32", []>;
-+defm V_CEIL_F32 : VOP1_32 <0x00000022, "V_CEIL_F32", []>;
++defm V_CEIL_F32 : VOP1_32 <0x00000022, "V_CEIL_F32",
++  [(set VReg_32:$dst, (fceil VSrc_32:$src0))]
++>;
 +defm V_RNDNE_F32 : VOP1_32 <0x00000023, "V_RNDNE_F32",
-+  [(set VReg_32:$dst, (frint AllReg_32:$src0))]
++  [(set VReg_32:$dst, (frint VSrc_32:$src0))]
 +>;
 +defm V_FLOOR_F32 : VOP1_32 <0x00000024, "V_FLOOR_F32",
-+  [(set VReg_32:$dst, (ffloor AllReg_32:$src0))]
++  [(set VReg_32:$dst, (ffloor VSrc_32:$src0))]
 +>;
 +defm V_EXP_F32 : VOP1_32 <0x00000025, "V_EXP_F32",
-+  [(set VReg_32:$dst, (fexp2 AllReg_32:$src0))]
++  [(set VReg_32:$dst, (fexp2 VSrc_32:$src0))]
 +>;
 +defm V_LOG_CLAMP_F32 : VOP1_32 <0x00000026, "V_LOG_CLAMP_F32", []>;
-+defm V_LOG_F32 : VOP1_32 <0x00000027, "V_LOG_F32", []>;
++defm V_LOG_F32 : VOP1_32 <0x00000027, "V_LOG_F32",
++  [(set VReg_32:$dst, (flog2 VSrc_32:$src0))]
++>;
 +defm V_RCP_CLAMP_F32 : VOP1_32 <0x00000028, "V_RCP_CLAMP_F32", []>;
 +defm V_RCP_LEGACY_F32 : VOP1_32 <0x00000029, "V_RCP_LEGACY_F32", []>;
 +defm V_RCP_F32 : VOP1_32 <0x0000002a, "V_RCP_F32",
-+  [(set VReg_32:$dst, (fdiv FP_ONE, AllReg_32:$src0))]
++  [(set VReg_32:$dst, (fdiv FP_ONE, VSrc_32:$src0))]
 +>;
 +defm V_RCP_IFLAG_F32 : VOP1_32 <0x0000002b, "V_RCP_IFLAG_F32", []>;
 +defm V_RSQ_CLAMP_F32 : VOP1_32 <0x0000002c, "V_RSQ_CLAMP_F32", []>;
 +defm V_RSQ_LEGACY_F32 : VOP1_32 <
 +  0x0000002d, "V_RSQ_LEGACY_F32",
-+  [(set VReg_32:$dst, (int_AMDGPU_rsq AllReg_32:$src0))]
++  [(set VReg_32:$dst, (int_AMDGPU_rsq VSrc_32:$src0))]
 +>;
 +defm V_RSQ_F32 : VOP1_32 <0x0000002e, "V_RSQ_F32", []>;
 +defm V_RCP_F64 : VOP1_64 <0x0000002f, "V_RCP_F64", []>;
@@ -20231,10 +21814,9 @@ index 0000000..005be96
 +def V_INTERP_MOV_F32 : VINTRP <
 +  0x00000002,
 +  (outs VReg_32:$dst),
-+  (ins i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
-+  "V_INTERP_MOV_F32",
++  (ins InterpSlot:$src0, i32imm:$attr_chan, i32imm:$attr, M0Reg:$m0),
++  "V_INTERP_MOV_F32 $dst, $src0, $attr_chan, $attr",
 +  []> {
-+  let VSRC = 0;
 +  let DisableEncoding = "$m0";
 +}
 +
@@ -20314,22 +21896,22 @@ index 0000000..005be96
 +//def S_TTRACEDATA : SOPP_ <0x00000016, "S_TTRACEDATA", []>;
 +
 +def V_CNDMASK_B32_e32 : VOP2 <0x00000000, (outs VReg_32:$dst),
-+  (ins AllReg_32:$src0, VReg_32:$src1, VCCReg:$vcc), "V_CNDMASK_B32_e32",
++  (ins VSrc_32:$src0, VReg_32:$src1, VCCReg:$vcc), "V_CNDMASK_B32_e32",
 +  []
 +>{
 +  let DisableEncoding = "$vcc";
 +}
 +
 +def V_CNDMASK_B32_e64 : VOP3 <0x00000100, (outs VReg_32:$dst),
-+  (ins VReg_32:$src0, VReg_32:$src1, SReg_1:$src2, InstFlag:$abs, InstFlag:$clamp, InstFlag:$omod, InstFlag:$neg),
++  (ins VReg_32:$src0, VReg_32:$src1, SReg_64:$src2, InstFlag:$abs, InstFlag:$clamp, InstFlag:$omod, InstFlag:$neg),
 +  "V_CNDMASK_B32_e64",
-+  [(set (i32 VReg_32:$dst), (select SReg_1:$src2, VReg_32:$src1, VReg_32:$src0))]
++  [(set (i32 VReg_32:$dst), (select (i1 SReg_64:$src2), VReg_32:$src1, VReg_32:$src0))]
 +>;
 +
 +//f32 pattern for V_CNDMASK_B32_e64
 +def : Pat <
-+  (f32 (select SReg_1:$src2, VReg_32:$src1, VReg_32:$src0)),
-+  (V_CNDMASK_B32_e64 VReg_32:$src0, VReg_32:$src1, SReg_1:$src2)
++  (f32 (select (i1 SReg_64:$src2), VReg_32:$src1, VReg_32:$src0)),
++  (V_CNDMASK_B32_e64 VReg_32:$src0, VReg_32:$src1, SReg_64:$src2)
 +>;
 +
 +defm V_READLANE_B32 : VOP2_32 <0x00000001, "V_READLANE_B32", []>;
@@ -20337,35 +21919,35 @@ index 0000000..005be96
 +
 +defm V_ADD_F32 : VOP2_32 <0x00000003, "V_ADD_F32", []>;
 +def : Pat <
-+  (f32 (fadd AllReg_32:$src0, VReg_32:$src1)),
-+  (V_ADD_F32_e32  AllReg_32:$src0, VReg_32:$src1)
++  (f32 (fadd VSrc_32:$src0, VReg_32:$src1)),
++  (V_ADD_F32_e32  VSrc_32:$src0, VReg_32:$src1)
 +>;
 +
 +defm V_SUB_F32 : VOP2_32 <0x00000004, "V_SUB_F32", []>;
 +def : Pat <
-+  (f32 (fsub AllReg_32:$src0, VReg_32:$src1)),
-+  (V_SUB_F32_e32  AllReg_32:$src0, VReg_32:$src1)
++  (f32 (fsub VSrc_32:$src0, VReg_32:$src1)),
++  (V_SUB_F32_e32  VSrc_32:$src0, VReg_32:$src1)
 +>;
 +defm V_SUBREV_F32 : VOP2_32 <0x00000005, "V_SUBREV_F32", []>;
 +defm V_MAC_LEGACY_F32 : VOP2_32 <0x00000006, "V_MAC_LEGACY_F32", []>;
 +defm V_MUL_LEGACY_F32 : VOP2_32 <
 +  0x00000007, "V_MUL_LEGACY_F32",
-+  [(set VReg_32:$dst, (int_AMDGPU_mul AllReg_32:$src0, VReg_32:$src1))]
++  [(set VReg_32:$dst, (int_AMDGPU_mul VSrc_32:$src0, VReg_32:$src1))]
 +>;
 +
 +defm V_MUL_F32 : VOP2_32 <0x00000008, "V_MUL_F32",
-+  [(set VReg_32:$dst, (fmul AllReg_32:$src0, VReg_32:$src1))]
++  [(set VReg_32:$dst, (fmul VSrc_32:$src0, VReg_32:$src1))]
 +>;
 +//defm V_MUL_I32_I24 : VOP2_32 <0x00000009, "V_MUL_I32_I24", []>;
 +//defm V_MUL_HI_I32_I24 : VOP2_32 <0x0000000a, "V_MUL_HI_I32_I24", []>;
 +//defm V_MUL_U32_U24 : VOP2_32 <0x0000000b, "V_MUL_U32_U24", []>;
 +//defm V_MUL_HI_U32_U24 : VOP2_32 <0x0000000c, "V_MUL_HI_U32_U24", []>;
 +defm V_MIN_LEGACY_F32 : VOP2_32 <0x0000000d, "V_MIN_LEGACY_F32",
-+  [(set VReg_32:$dst, (AMDGPUfmin AllReg_32:$src0, VReg_32:$src1))]
++  [(set VReg_32:$dst, (AMDGPUfmin VSrc_32:$src0, VReg_32:$src1))]
 +>;
 +
 +defm V_MAX_LEGACY_F32 : VOP2_32 <0x0000000e, "V_MAX_LEGACY_F32",
-+  [(set VReg_32:$dst, (AMDGPUfmax AllReg_32:$src0, VReg_32:$src1))]
++  [(set VReg_32:$dst, (AMDGPUfmax VSrc_32:$src0, VReg_32:$src1))]
 +>;
 +defm V_MIN_F32 : VOP2_32 <0x0000000f, "V_MIN_F32", []>;
 +defm V_MAX_F32 : VOP2_32 <0x00000010, "V_MAX_F32", []>;
@@ -20380,13 +21962,13 @@ index 0000000..005be96
 +defm V_LSHL_B32 : VOP2_32 <0x00000019, "V_LSHL_B32", []>;
 +defm V_LSHLREV_B32 : VOP2_32 <0x0000001a, "V_LSHLREV_B32", []>;
 +defm V_AND_B32 : VOP2_32 <0x0000001b, "V_AND_B32",
-+  [(set VReg_32:$dst, (and AllReg_32:$src0, VReg_32:$src1))]
++  [(set VReg_32:$dst, (and VSrc_32:$src0, VReg_32:$src1))]
 +>;
 +defm V_OR_B32 : VOP2_32 <0x0000001c, "V_OR_B32",
-+  [(set VReg_32:$dst, (or AllReg_32:$src0, VReg_32:$src1))]
++  [(set VReg_32:$dst, (or VSrc_32:$src0, VReg_32:$src1))]
 +>;
 +defm V_XOR_B32 : VOP2_32 <0x0000001d, "V_XOR_B32",
-+  [(set VReg_32:$dst, (xor AllReg_32:$src0, VReg_32:$src1))]
++  [(set VReg_32:$dst, (xor VSrc_32:$src0, VReg_32:$src1))]
 +>;
 +defm V_BFM_B32 : VOP2_32 <0x0000001e, "V_BFM_B32", []>;
 +defm V_MAC_F32 : VOP2_32 <0x0000001f, "V_MAC_F32", []>;
@@ -20397,10 +21979,10 @@ index 0000000..005be96
 +//defm V_MBCNT_HI_U32_B32 : VOP2_32 <0x00000024, "V_MBCNT_HI_U32_B32", []>;
 +let Defs = [VCC] in { // Carry-out goes to VCC
 +defm V_ADD_I32 : VOP2_32 <0x00000025, "V_ADD_I32",
-+  [(set VReg_32:$dst, (add (i32 AllReg_32:$src0), (i32 VReg_32:$src1)))]
++  [(set VReg_32:$dst, (add (i32 VSrc_32:$src0), (i32 VReg_32:$src1)))]
 +>;
 +defm V_SUB_I32 : VOP2_32 <0x00000026, "V_SUB_I32",
-+  [(set VReg_32:$dst, (sub (i32 AllReg_32:$src0), (i32 VReg_32:$src1)))]
++  [(set VReg_32:$dst, (sub (i32 VSrc_32:$src0), (i32 VReg_32:$src1)))]
 +>;
 +} // End Defs = [VCC]
 +defm V_SUBREV_I32 : VOP2_32 <0x00000027, "V_SUBREV_I32", []>;
@@ -20412,7 +21994,7 @@ index 0000000..005be96
 +////def V_CVT_PKNORM_I16_F32 : VOP2_I16 <0x0000002d, "V_CVT_PKNORM_I16_F32", []>;
 +////def V_CVT_PKNORM_U16_F32 : VOP2_U16 <0x0000002e, "V_CVT_PKNORM_U16_F32", []>;
 +defm V_CVT_PKRTZ_F16_F32 : VOP2_32 <0x0000002f, "V_CVT_PKRTZ_F16_F32",
-+ [(set VReg_32:$dst, (int_SI_packf16 AllReg_32:$src0, VReg_32:$src1))]
++ [(set VReg_32:$dst, (int_SI_packf16 VSrc_32:$src0, VReg_32:$src1))]
 +>;
 +////def V_CVT_PK_U16_U32 : VOP2_U16 <0x00000030, "V_CVT_PK_U16_U32", []>;
 +////def V_CVT_PK_I16_I32 : VOP2_I16 <0x00000031, "V_CVT_PK_I16_I32", []>;
@@ -20482,6 +22064,10 @@ index 0000000..005be96
 +def V_MUL_LO_U32 : VOP3_32 <0x00000169, "V_MUL_LO_U32", []>;
 +def V_MUL_HI_U32 : VOP3_32 <0x0000016a, "V_MUL_HI_U32", []>;
 +def V_MUL_LO_I32 : VOP3_32 <0x0000016b, "V_MUL_LO_I32", []>;
++def : Pat <
++  (mul VSrc_32:$src0, VReg_32:$src1),
++  (V_MUL_LO_I32 VSrc_32:$src0, VReg_32:$src1, (IMPLICIT_DEF), 0, 0, 0, 0)
++>;
 +def V_MUL_HI_I32 : VOP3_32 <0x0000016c, "V_MUL_HI_I32", []>;
 +def V_DIV_SCALE_F32 : VOP3_32 <0x0000016d, "V_DIV_SCALE_F32", []>;
 +def V_DIV_SCALE_F64 : VOP3_64 <0x0000016e, "V_DIV_SCALE_F64", []>;
@@ -20519,13 +22105,20 @@ index 0000000..005be96
 +def S_AND_B32 : SOP2_32 <0x0000000e, "S_AND_B32", []>;
 +
 +def S_AND_B64 : SOP2_64 <0x0000000f, "S_AND_B64",
-+  [(set SReg_64:$dst, (and SReg_64:$src0, SReg_64:$src1))]
++  [(set SReg_64:$dst, (i64 (and SSrc_64:$src0, SSrc_64:$src1)))]
 +>;
-+def S_AND_VCC : SOP2_VCC <0x0000000f, "S_AND_B64",
-+  [(set SReg_1:$vcc, (SIvcc_and SReg_64:$src0, SReg_64:$src1))]
++
++def : Pat <
++  (i1 (and SSrc_64:$src0, SSrc_64:$src1)),
++  (S_AND_B64 SSrc_64:$src0, SSrc_64:$src1)
 +>;
++
 +def S_OR_B32 : SOP2_32 <0x00000010, "S_OR_B32", []>;
 +def S_OR_B64 : SOP2_64 <0x00000011, "S_OR_B64", []>;
++def : Pat <
++  (i1 (or SSrc_64:$src0, SSrc_64:$src1)),
++  (S_OR_B64 SSrc_64:$src0, SSrc_64:$src1)
++>;
 +def S_XOR_B32 : SOP2_32 <0x00000012, "S_XOR_B32", []>;
 +def S_XOR_B64 : SOP2_64 <0x00000013, "S_XOR_B64", []>;
 +def S_ANDN2_B32 : SOP2_32 <0x00000014, "S_ANDN2_B32", []>;
@@ -20554,48 +22147,6 @@ index 0000000..005be96
 +//def S_CBRANCH_G_FORK : SOP2_ <0x0000002b, "S_CBRANCH_G_FORK", []>;
 +def S_ABSDIFF_I32 : SOP2_32 <0x0000002c, "S_ABSDIFF_I32", []>;
 +
-+class V_MOV_IMM <Operand immType, SDNode immNode> : InstSI <
-+  (outs VReg_32:$dst),
-+  (ins immType:$src0),
-+  "V_MOV_IMM",
-+   [(set VReg_32:$dst, (immNode:$src0))]
-+>;
-+
-+let isCodeGenOnly = 1, isPseudo = 1 in {
-+
-+def V_MOV_IMM_I32 : V_MOV_IMM<i32imm, imm>;
-+def V_MOV_IMM_F32 : V_MOV_IMM<f32imm, fpimm>;
-+
-+def S_MOV_IMM_I32 : InstSI <
-+  (outs SReg_32:$dst),
-+  (ins i32imm:$src0),
-+  "S_MOV_IMM_I32",
-+  [(set SReg_32:$dst, (imm:$src0))]
-+>;
-+
-+// i64 immediates aren't really supported in hardware, but LLVM will use the i64
-+// type for indices on load and store instructions.  The pattern for
-+// S_MOV_IMM_I64 will only match i64 immediates that can fit into 32-bits,
-+// which the hardware can handle.
-+def S_MOV_IMM_I64 : InstSI <
-+  (outs SReg_64:$dst),
-+  (ins i64imm:$src0),
-+  "S_MOV_IMM_I64 $dst, $src0",
-+  [(set SReg_64:$dst, (IMM32bitIn64bit:$src0))]
-+>;
-+
-+} // End isCodeGenOnly, isPseudo = 1
-+
-+class SI_LOAD_LITERAL<Operand ImmType> :
-+    Enc32 <(outs), (ins ImmType:$imm), "LOAD_LITERAL $imm", []> {
-+
-+  bits<32> imm;
-+  let Inst{31-0} = imm;
-+}
-+
-+def SI_LOAD_LITERAL_I32 : SI_LOAD_LITERAL<i32imm>;
-+def SI_LOAD_LITERAL_F32 : SI_LOAD_LITERAL<f32imm>;
-+
 +let isCodeGenOnly = 1, isPseudo = 1 in {
 +
 +def SET_M0 : InstSI <
@@ -20614,13 +22165,6 @@ index 0000000..005be96
 +
 +let usesCustomInserter = 1 in {
 +
-+def SI_V_CNDLT : InstSI <
-+  (outs VReg_32:$dst),
-+  (ins VReg_32:$src0, VReg_32:$src1, VReg_32:$src2),
-+  "SI_V_CNDLT $dst, $src0, $src1, $src2",
-+  [(set VReg_32:$dst, (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, VReg_32:$src2))]
-+>;
-+
 +def SI_INTERP : InstSI <
 +  (outs VReg_32:$dst),
 +  (ins VReg_32:$i, VReg_32:$j, i32imm:$attr_chan, i32imm:$attr, SReg_32:$params),
@@ -20628,21 +22172,6 @@ index 0000000..005be96
 +  []
 +>;
 +
-+def SI_INTERP_CONST : InstSI <
-+  (outs VReg_32:$dst),
-+  (ins i32imm:$attr_chan, i32imm:$attr, SReg_32:$params),
-+  "SI_INTERP_CONST $dst, $attr_chan, $attr, $params",
-+  [(set VReg_32:$dst, (int_SI_fs_interp_constant imm:$attr_chan,
-+                                                 imm:$attr, SReg_32:$params))]
-+>;
-+
-+def SI_KIL : InstSI <
-+  (outs),
-+  (ins VReg_32:$src),
-+  "SI_KIL $src",
-+  [(int_AMDGPU_kill VReg_32:$src)]
-+>;
-+
 +def SI_WQM : InstSI <
 +  (outs),
 +  (ins),
@@ -20662,9 +22191,9 @@ index 0000000..005be96
 +
 +def SI_IF : InstSI <
 +  (outs SReg_64:$dst),
-+  (ins SReg_1:$vcc, brtarget:$target),
++  (ins SReg_64:$vcc, brtarget:$target),
 +  "SI_IF",
-+  [(set SReg_64:$dst, (int_SI_if SReg_1:$vcc, bb:$target))]
++  [(set SReg_64:$dst, (int_SI_if SReg_64:$vcc, bb:$target))]
 +>;
 +
 +def SI_ELSE : InstSI <
@@ -20694,9 +22223,9 @@ index 0000000..005be96
 +
 +def SI_IF_BREAK : InstSI <
 +  (outs SReg_64:$dst),
-+  (ins SReg_1:$vcc, SReg_64:$src),
++  (ins SReg_64:$vcc, SReg_64:$src),
 +  "SI_IF_BREAK",
-+  [(set SReg_64:$dst, (int_SI_if_break SReg_1:$vcc, SReg_64:$src))]
++  [(set SReg_64:$dst, (int_SI_if_break SReg_64:$vcc, SReg_64:$src))]
 +>;
 +
 +def SI_ELSE_BREAK : InstSI <
@@ -20713,18 +22242,35 @@ index 0000000..005be96
 +  [(int_SI_end_cf SReg_64:$saved)]
 +>;
 +
++def SI_KILL : InstSI <
++  (outs),
++  (ins VReg_32:$src),
++  "SI_KIL $src",
++  [(int_AMDGPU_kill VReg_32:$src)]
++>;
++
 +} // end mayLoad = 1, mayStore = 1, hasSideEffects = 1
 +  // Uses = [EXEC], Defs = [EXEC]
 +
 +} // end IsCodeGenOnly, isPseudo
 +
++def : Pat<
++  (int_AMDGPU_cndlt VReg_32:$src0, VReg_32:$src1, VReg_32:$src2),
++  (V_CNDMASK_B32_e64 VReg_32:$src2, VReg_32:$src1, (V_CMP_GT_F32_e64 0, VReg_32:$src0))
++>;
++
++def : Pat <
++  (int_AMDGPU_kilp),
++  (SI_KILL (V_MOV_B32_e32 0xbf800000))
++>;
++
 +/* int_SI_vs_load_input */
 +def : Pat<
 +  (int_SI_vs_load_input SReg_128:$tlst, IMM12bit:$attr_offset,
 +                        VReg_32:$buf_idx_vgpr),
 +  (BUFFER_LOAD_FORMAT_XYZW imm:$attr_offset, 0, 1, 0, 0, 0,
 +                           VReg_32:$buf_idx_vgpr, SReg_128:$tlst,
-+                           0, 0, (i32 SREG_LIT_0))
++                           0, 0, 0)
 +>;
 +
 +/* int_SI_export */
@@ -20735,43 +22281,105 @@ index 0000000..005be96
 +       VReg_32:$src0, VReg_32:$src1, VReg_32:$src2, VReg_32:$src3)
 +>;
 +
-+/* int_SI_sample */
++
++/* int_SI_sample for simple 1D texture lookup */
 +def : Pat <
-+  (int_SI_sample imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler),
-+  (IMAGE_SAMPLE imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord,
++  (int_SI_sample imm:$writemask, (v1i32 VReg_32:$addr),
++                 SReg_256:$rsrc, SReg_128:$sampler, imm),
++  (IMAGE_SAMPLE imm:$writemask, 0, 0, 0, 0, 0, 0, 0,
++                (i32 (COPY_TO_REGCLASS VReg_32:$addr, VReg_32)),
 +                SReg_256:$rsrc, SReg_128:$sampler)
 +>;
 +
-+/* int_SI_sample_lod */
-+def : Pat <
-+  (int_SI_sample_lod imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler),
-+  (IMAGE_SAMPLE_L imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord,
-+                  SReg_256:$rsrc, SReg_128:$sampler)
++class SamplePattern<Intrinsic name, MIMG opcode, RegisterClass addr_class,
++                    ValueType addr_type> : Pat <
++    (name imm:$writemask, (addr_type addr_class:$addr),
++          SReg_256:$rsrc, SReg_128:$sampler, imm),
++    (opcode imm:$writemask, 0, 0, 0, 0, 0, 0, 0,
++          (EXTRACT_SUBREG addr_class:$addr, sub0),
++          SReg_256:$rsrc, SReg_128:$sampler)
 +>;
 +
-+/* int_SI_sample_bias */
-+def : Pat <
-+  (int_SI_sample_bias imm:$writemask, VReg_128:$coord, SReg_256:$rsrc, SReg_128:$sampler),
-+  (IMAGE_SAMPLE_B imm:$writemask, 0, 0, 0, 0, 0, 0, 0, VReg_128:$coord,
-+                  SReg_256:$rsrc, SReg_128:$sampler)
++class SampleRectPattern<Intrinsic name, MIMG opcode, RegisterClass addr_class,
++                        ValueType addr_type> : Pat <
++    (name imm:$writemask, (addr_type addr_class:$addr),
++          SReg_256:$rsrc, SReg_128:$sampler, TEX_RECT),
++    (opcode imm:$writemask, 1, 0, 0, 0, 0, 0, 0,
++          (EXTRACT_SUBREG addr_class:$addr, sub0),
++          SReg_256:$rsrc, SReg_128:$sampler)
++>;
++
++class SampleArrayPattern<Intrinsic name, MIMG opcode, RegisterClass addr_class,
++                         ValueType addr_type> : Pat <
++    (name imm:$writemask, (addr_type addr_class:$addr),
++          SReg_256:$rsrc, SReg_128:$sampler, TEX_ARRAY),
++    (opcode imm:$writemask, 0, 0, 1, 0, 0, 0, 0,
++          (EXTRACT_SUBREG addr_class:$addr, sub0),
++          SReg_256:$rsrc, SReg_128:$sampler)
++>;
++
++class SampleShadowPattern<Intrinsic name, MIMG opcode,
++                          RegisterClass addr_class, ValueType addr_type> : Pat <
++    (name imm:$writemask, (addr_type addr_class:$addr),
++          SReg_256:$rsrc, SReg_128:$sampler, TEX_SHADOW),
++    (opcode imm:$writemask, 0, 0, 0, 0, 0, 0, 0,
++          (EXTRACT_SUBREG addr_class:$addr, sub0),
++          SReg_256:$rsrc, SReg_128:$sampler)
++>;
++
++class SampleShadowArrayPattern<Intrinsic name, MIMG opcode,
++                               RegisterClass addr_class, ValueType addr_type> : Pat <
++    (name imm:$writemask, (addr_type addr_class:$addr),
++          SReg_256:$rsrc, SReg_128:$sampler, TEX_SHADOW_ARRAY),
++    (opcode imm:$writemask, 0, 0, 1, 0, 0, 0, 0,
++          (EXTRACT_SUBREG addr_class:$addr, sub0),
++          SReg_256:$rsrc, SReg_128:$sampler)
 +>;
 +
++/* int_SI_sample* for texture lookups consuming more address parameters */
++multiclass SamplePatterns<RegisterClass addr_class, ValueType addr_type> {
++  def : SamplePattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>;
++  def : SampleRectPattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>;
++  def : SampleArrayPattern <int_SI_sample, IMAGE_SAMPLE, addr_class, addr_type>;
++  def : SampleShadowPattern <int_SI_sample, IMAGE_SAMPLE_C, addr_class, addr_type>;
++  def : SampleShadowArrayPattern <int_SI_sample, IMAGE_SAMPLE_C, addr_class, addr_type>;
++
++  def : SamplePattern <int_SI_samplel, IMAGE_SAMPLE_L, addr_class, addr_type>;
++  def : SampleArrayPattern <int_SI_samplel, IMAGE_SAMPLE_L, addr_class, addr_type>;
++  def : SampleShadowPattern <int_SI_samplel, IMAGE_SAMPLE_C_L, addr_class, addr_type>;
++  def : SampleShadowArrayPattern <int_SI_samplel, IMAGE_SAMPLE_C_L, addr_class, addr_type>;
++
++  def : SamplePattern <int_SI_sampleb, IMAGE_SAMPLE_B, addr_class, addr_type>;
++  def : SampleArrayPattern <int_SI_sampleb, IMAGE_SAMPLE_B, addr_class, addr_type>;
++  def : SampleShadowPattern <int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_class, addr_type>;
++  def : SampleShadowArrayPattern <int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_class, addr_type>;
++}
++
++defm : SamplePatterns<VReg_64, v2i32>;
++defm : SamplePatterns<VReg_128, v4i32>;
++defm : SamplePatterns<VReg_256, v8i32>;
++defm : SamplePatterns<VReg_512, v16i32>;
++
 +def CLAMP_SI : CLAMP<VReg_32>;
 +def FABS_SI : FABS<VReg_32>;
 +def FNEG_SI : FNEG<VReg_32>;
 +
-+def : Extract_Element <f32, v4f32, VReg_128, 0, sel_x>;
-+def : Extract_Element <f32, v4f32, VReg_128, 1, sel_y>;
-+def : Extract_Element <f32, v4f32, VReg_128, 2, sel_z>;
-+def : Extract_Element <f32, v4f32, VReg_128, 3, sel_w>;
++def : Extract_Element <f32, v4f32, VReg_128, 0, sub0>;
++def : Extract_Element <f32, v4f32, VReg_128, 1, sub1>;
++def : Extract_Element <f32, v4f32, VReg_128, 2, sub2>;
++def : Extract_Element <f32, v4f32, VReg_128, 3, sub3>;
 +
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 4, sel_x>;
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 5, sel_y>;
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 6, sel_z>;
-+def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 7, sel_w>;
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 4, sub0>;
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 5, sub1>;
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 6, sub2>;
++def : Insert_Element <f32, v4f32, VReg_32, VReg_128, 7, sub3>;
 +
++def : Vector1_Build <v1i32, VReg_32, i32, VReg_32>;
++def : Vector2_Build <v2i32, VReg_64, i32, VReg_32>;
 +def : Vector_Build <v4f32, VReg_128, f32, VReg_32>;
-+def : Vector_Build <v4i32, SReg_128, i32, SReg_32>;
++def : Vector_Build <v4i32, VReg_128, i32, VReg_32>;
++def : Vector8_Build <v8i32, VReg_256, i32, VReg_32>;
++def : Vector16_Build <v16i32, VReg_512, i32, VReg_32>;
 +
 +def : BitConvert <i32, f32, SReg_32>;
 +def : BitConvert <i32, f32, VReg_32>;
@@ -20779,24 +22387,46 @@ index 0000000..005be96
 +def : BitConvert <f32, i32, SReg_32>;
 +def : BitConvert <f32, i32, VReg_32>;
 +
++/********** ================== **********/
++/********** Immediate Patterns **********/
++/********** ================== **********/
++
++def : Pat <
++  (i1 imm:$imm),
++  (S_MOV_B64 imm:$imm)
++>;
++
++def : Pat <
++  (i32 imm:$imm),
++  (V_MOV_B32_e32 imm:$imm)
++>;
++
++def : Pat <
++  (f32 fpimm:$imm),
++  (V_MOV_B32_e32 fpimm:$imm)
++>;
++
 +def : Pat <
-+  (i64 (SIsreg1_bitcast SReg_1:$vcc)),
-+  (S_MOV_B64 (COPY_TO_REGCLASS SReg_1:$vcc, SReg_64))
++  (i32 imm:$imm),
++  (S_MOV_B32 imm:$imm)
 +>;
 +
 +def : Pat <
-+  (i1 (SIsreg1_bitcast SReg_64:$vcc)),
-+  (COPY_TO_REGCLASS SReg_64:$vcc, SReg_1)
++  (f32 fpimm:$imm),
++  (S_MOV_B32 fpimm:$imm)
 +>;
 +
 +def : Pat <
-+  (i64 (SIvcc_bitcast VCCReg:$vcc)),
-+  (S_MOV_B64 (COPY_TO_REGCLASS VCCReg:$vcc, SReg_64))
++  (i64 InlineImm<i64>:$imm),
++  (S_MOV_B64 InlineImm<i64>:$imm)
 +>;
 +
++// i64 immediates aren't supported in hardware, split it into two 32bit values
 +def : Pat <
-+  (i1 (SIvcc_bitcast SReg_64:$vcc)),
-+  (COPY_TO_REGCLASS SReg_64:$vcc, VCCReg)
++  (i64 imm:$imm),
++  (INSERT_SUBREG (INSERT_SUBREG (i64 (IMPLICIT_DEF)),
++    (S_MOV_B32 (i32 (LO32 imm:$imm))), sub0),
++    (S_MOV_B32 (i32 (HI32 imm:$imm))), sub1)
 +>;
 +
 +/********** ===================== **********/
@@ -20804,6 +22434,12 @@ index 0000000..005be96
 +/********** ===================== **********/
 +
 +def : Pat <
++  (int_SI_fs_interp_constant imm:$attr_chan, imm:$attr, SReg_32:$params),
++  (V_INTERP_MOV_F32 INTERP.P0, imm:$attr_chan, imm:$attr,
++                    (S_MOV_B32 SReg_32:$params))
++>;
++
++def : Pat <
 +  (int_SI_fs_interp_linear_center imm:$attr_chan, imm:$attr, SReg_32:$params),
 +  (SI_INTERP (f32 LINEAR_CENTER_I), (f32 LINEAR_CENTER_J), imm:$attr_chan,
 +             imm:$attr, SReg_32:$params)
@@ -20861,56 +22497,95 @@ index 0000000..005be96
 +def : POW_Common <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_F32_e32, VReg_32>;
 +
 +def : Pat <
-+  (int_AMDGPU_div AllReg_32:$src0, AllReg_32:$src1),
-+  (V_MUL_LEGACY_F32_e32 AllReg_32:$src0, (V_RCP_LEGACY_F32_e32 AllReg_32:$src1))
++  (int_AMDGPU_div VSrc_32:$src0, VSrc_32:$src1),
++  (V_MUL_LEGACY_F32_e32 VSrc_32:$src0, (V_RCP_LEGACY_F32_e32 VSrc_32:$src1))
 +>;
 +
 +def : Pat<
-+  (fdiv AllReg_32:$src0, AllReg_32:$src1),
-+  (V_MUL_F32_e32 AllReg_32:$src0, (V_RCP_F32_e32 AllReg_32:$src1))
++  (fdiv VSrc_32:$src0, VSrc_32:$src1),
++  (V_MUL_F32_e32 VSrc_32:$src0, (V_RCP_F32_e32 VSrc_32:$src1))
 +>;
 +
 +def : Pat <
-+  (int_AMDGPU_kilp),
-+  (SI_KIL (V_MOV_IMM_I32 0xbf800000))
++  (fcos VSrc_32:$src0),
++  (V_COS_F32_e32 (V_MUL_F32_e32 VSrc_32:$src0, (V_MOV_B32_e32 CONST.TWO_PI_INV)))
++>;
++
++def : Pat <
++  (fsin VSrc_32:$src0),
++  (V_SIN_F32_e32 (V_MUL_F32_e32 VSrc_32:$src0, (V_MOV_B32_e32 CONST.TWO_PI_INV)))
 +>;
 +
 +def : Pat <
 +  (int_AMDGPU_cube VReg_128:$src),
 +  (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (INSERT_SUBREG (v4f32 (IMPLICIT_DEF)),
-+    (V_CUBETC_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
-+                  (EXTRACT_SUBREG VReg_128:$src, sel_y),
-+                  (EXTRACT_SUBREG VReg_128:$src, sel_z),
-+                  0, 0, 0, 0), sel_x),
-+    (V_CUBESC_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
-+                  (EXTRACT_SUBREG VReg_128:$src, sel_y),
-+                  (EXTRACT_SUBREG VReg_128:$src, sel_z),
-+                  0, 0, 0, 0), sel_y),
-+    (V_CUBEMA_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
-+                  (EXTRACT_SUBREG VReg_128:$src, sel_y),
-+                  (EXTRACT_SUBREG VReg_128:$src, sel_z),
-+                  0, 0, 0, 0), sel_z),
-+    (V_CUBEID_F32 (EXTRACT_SUBREG VReg_128:$src, sel_x),
-+                  (EXTRACT_SUBREG VReg_128:$src, sel_y),
-+                  (EXTRACT_SUBREG VReg_128:$src, sel_z),
-+                  0, 0, 0, 0), sel_w)
++    (V_CUBETC_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
++                  (EXTRACT_SUBREG VReg_128:$src, sub1),
++                  (EXTRACT_SUBREG VReg_128:$src, sub2),
++                  0, 0, 0, 0), sub0),
++    (V_CUBESC_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
++                  (EXTRACT_SUBREG VReg_128:$src, sub1),
++                  (EXTRACT_SUBREG VReg_128:$src, sub2),
++                  0, 0, 0, 0), sub1),
++    (V_CUBEMA_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
++                  (EXTRACT_SUBREG VReg_128:$src, sub1),
++                  (EXTRACT_SUBREG VReg_128:$src, sub2),
++                  0, 0, 0, 0), sub2),
++    (V_CUBEID_F32 (EXTRACT_SUBREG VReg_128:$src, sub0),
++                  (EXTRACT_SUBREG VReg_128:$src, sub1),
++                  (EXTRACT_SUBREG VReg_128:$src, sub2),
++                  0, 0, 0, 0), sub3)
++>;
++
++def : Pat <
++  (i32 (sext (i1 SReg_64:$src0))),
++  (V_CNDMASK_B32_e64 (i32 0), (i32 -1), SReg_64:$src0)
 +>;
 +
 +/********** ================== **********/
 +/**********   VOP3 Patterns    **********/
 +/********** ================== **********/
 +
-+def : Pat <(f32 (IL_mad AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2)),
-+           (V_MAD_LEGACY_F32 AllReg_32:$src0, VReg_32:$src1, VReg_32:$src2,
++def : Pat <(f32 (IL_mad VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2)),
++           (V_MAD_LEGACY_F32 VSrc_32:$src0, VReg_32:$src1, VReg_32:$src2,
 +            0, 0, 0, 0)>;
 +
++/********** ================== **********/
++/**********   SMRD Patterns    **********/
++/********** ================== **********/
++
++multiclass SMRD_Pattern <SMRD Instr_IMM, SMRD Instr_SGPR, ValueType vt> {
++  // 1. Offset as 8bit DWORD immediate
++  def : Pat <
++    (constant_load (SIadd64bit32bit SReg_64:$sbase, IMM8bitDWORD:$offset)),
++    (vt (Instr_IMM SReg_64:$sbase, IMM8bitDWORD:$offset))
++  >;
++
++  // 2. Offset loaded in an 32bit SGPR
++  def : Pat <
++    (constant_load (SIadd64bit32bit SReg_64:$sbase, imm:$offset)),
++    (vt (Instr_SGPR SReg_64:$sbase, (S_MOV_B32 imm:$offset)))
++  >;
++
++  // 3. No offset at all
++  def : Pat <
++    (constant_load SReg_64:$sbase),
++    (vt (Instr_IMM SReg_64:$sbase, 0))
++  >;
++}
++
++defm : SMRD_Pattern <S_LOAD_DWORD_IMM, S_LOAD_DWORD_SGPR, f32>;
++defm : SMRD_Pattern <S_LOAD_DWORD_IMM, S_LOAD_DWORD_SGPR, i32>;
++defm : SMRD_Pattern <S_LOAD_DWORDX4_IMM, S_LOAD_DWORDX4_SGPR, v4i32>;
++defm : SMRD_Pattern <S_LOAD_DWORDX8_IMM, S_LOAD_DWORDX8_SGPR, v8i32>;
++
 +} // End isSI predicate
 diff --git a/lib/Target/R600/SIIntrinsics.td b/lib/Target/R600/SIIntrinsics.td
 new file mode 100644
-index 0000000..c322fef
+index 0000000..611b9c4
 --- /dev/null
 +++ b/lib/Target/R600/SIIntrinsics.td
-@@ -0,0 +1,52 @@
+@@ -0,0 +1,54 @@
 +//===-- SIIntrinsics.td - SI Intrinsic defs ----------------*- tablegen -*-===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -20935,9 +22610,11 @@ index 0000000..c322fef
 +  def int_SI_vs_load_input : Intrinsic <[llvm_v4f32_ty], [llvm_v4i32_ty, llvm_i16_ty, llvm_i32_ty], [IntrReadMem]> ;
 +  def int_SI_wqm : Intrinsic <[], [], []>;
 +
-+  def int_SI_sample : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>;
-+  def int_SI_sample_bias : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>;
-+  def int_SI_sample_lod : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_v4f32_ty, llvm_v8i32_ty, llvm_v4i32_ty], [IntrReadMem]>;
++  class Sample : Intrinsic <[llvm_v4f32_ty], [llvm_i32_ty, llvm_anyvector_ty, llvm_v8i32_ty, llvm_v4i32_ty, llvm_i32_ty], [IntrReadMem]>;
++
++  def int_SI_sample : Sample;
++  def int_SI_sampleb : Sample;
++  def int_SI_samplel : Sample;
 +
 +  /* Interpolation Intrinsics */
 +
@@ -20965,10 +22642,10 @@ index 0000000..c322fef
 +}
 diff --git a/lib/Target/R600/SILowerControlFlow.cpp b/lib/Target/R600/SILowerControlFlow.cpp
 new file mode 100644
-index 0000000..3fbe653
+index 0000000..2007d30
 --- /dev/null
 +++ b/lib/Target/R600/SILowerControlFlow.cpp
-@@ -0,0 +1,331 @@
+@@ -0,0 +1,372 @@
 +//===-- SILowerControlFlow.cpp - Use predicates for control flow ----------===//
 +//
 +//                     The LLVM Compiler Infrastructure
@@ -21039,7 +22716,10 @@ index 0000000..3fbe653
 +  static char ID;
 +  const TargetInstrInfo *TII;
 +
-+  void Skip(MachineInstr &MI, MachineOperand &To);
++  bool shouldSkip(MachineBasicBlock *From, MachineBasicBlock *To);
++
++  void Skip(MachineInstr &From, MachineOperand &To);
++  void SkipIfDead(MachineInstr &MI);
 +
 +  void If(MachineInstr &MI);
 +  void Else(MachineInstr &MI);
@@ -21049,6 +22729,7 @@ index 0000000..3fbe653
 +  void Loop(MachineInstr &MI);
 +  void EndCf(MachineInstr &MI);
 +
++  void Kill(MachineInstr &MI);
 +  void Branch(MachineInstr &MI);
 +
 +public:
@@ -21071,22 +22752,29 @@ index 0000000..3fbe653
 +  return new SILowerControlFlowPass(tm);
 +}
 +
-+void SILowerControlFlowPass::Skip(MachineInstr &From, MachineOperand &To) {
++bool SILowerControlFlowPass::shouldSkip(MachineBasicBlock *From,
++                                        MachineBasicBlock *To) {
++
 +  unsigned NumInstr = 0;
 +
-+  for (MachineBasicBlock *MBB = *From.getParent()->succ_begin();
-+       NumInstr < SkipThreshold && MBB != To.getMBB() && !MBB->succ_empty();
++  for (MachineBasicBlock *MBB = From; MBB != To && !MBB->succ_empty();
 +       MBB = *MBB->succ_begin()) {
 +
 +    for (MachineBasicBlock::iterator I = MBB->begin(), E = MBB->end();
 +         NumInstr < SkipThreshold && I != E; ++I) {
 +
 +      if (I->isBundle() || !I->isBundled())
-+        ++NumInstr;
++        if (++NumInstr >= SkipThreshold)
++          return true;
 +    }
 +  }
 +
-+  if (NumInstr < SkipThreshold)
++  return false;
++}
++
++void SILowerControlFlowPass::Skip(MachineInstr &From, MachineOperand &To) {
++
++  if (!shouldSkip(*From.getParent()->succ_begin(), To.getMBB()))
 +    return;
 +
 +  DebugLoc DL = From.getDebugLoc();
@@ -21095,6 +22783,38 @@ index 0000000..3fbe653
 +          .addReg(AMDGPU::EXEC);
 +}
 +
++void SILowerControlFlowPass::SkipIfDead(MachineInstr &MI) {
++
++  MachineBasicBlock &MBB = *MI.getParent();
++  DebugLoc DL = MI.getDebugLoc();
++
++  if (!shouldSkip(&MBB, &MBB.getParent()->back()))
++    return;
++
++  MachineBasicBlock::iterator Insert = &MI;
++  ++Insert;
++
++  // If the exec mask is non-zero, skip the next two instructions
++  BuildMI(MBB, Insert, DL, TII->get(AMDGPU::S_CBRANCH_EXECNZ))
++          .addImm(3)
++          .addReg(AMDGPU::EXEC);
++
++  // Exec mask is zero: Export to NULL target...
++  BuildMI(MBB, Insert, DL, TII->get(AMDGPU::EXP))
++          .addImm(0)
++          .addImm(0x09) // V_008DFC_SQ_EXP_NULL
++          .addImm(0)
++          .addImm(1)
++          .addImm(1)
++          .addReg(AMDGPU::VGPR0)
++          .addReg(AMDGPU::VGPR0)
++          .addReg(AMDGPU::VGPR0)
++          .addReg(AMDGPU::VGPR0);
++
++  // ... and terminate wavefront
++  BuildMI(MBB, Insert, DL, TII->get(AMDGPU::S_ENDPGM));
++}
++
 +void SILowerControlFlowPass::If(MachineInstr &MI) {
 +  MachineBasicBlock &MBB = *MI.getParent();
 +  DebugLoc DL = MI.getDebugLoc();
@@ -21213,8 +22933,28 @@ index 0000000..3fbe653
 +    assert(0);
 +}
 +
++void SILowerControlFlowPass::Kill(MachineInstr &MI) {
++
++  MachineBasicBlock &MBB = *MI.getParent();
++  DebugLoc DL = MI.getDebugLoc();
++
++  // Kill is only allowed in pixel shaders
++  MachineFunction &MF = *MBB.getParent();
++  SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
++  assert(Info->ShaderType == ShaderType::PIXEL);
++
++  // Clear this pixel from the exec mask if the operand is negative
++  BuildMI(MBB, &MI, DL, TII->get(AMDGPU::V_CMPX_LE_F32_e32), AMDGPU::VCC)
++          .addImm(0)
++          .addOperand(MI.getOperand(0));
++
++  MI.eraseFromParent();
++}
++
 +bool SILowerControlFlowPass::runOnMachineFunction(MachineFunction &MF) {
-+  bool HaveCf = false;
++
++  bool HaveKill = false;
++  unsigned Depth = 0;
 +
 +  for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
 +       BI != BE; ++BI) {
@@ -21228,6 +22968,7 @@ index 0000000..3fbe653
 +      switch (MI.getOpcode()) {
 +        default: break;
 +        case AMDGPU::SI_IF:
++          ++Depth;
 +          If(MI);
 +          break;
 +
@@ -21248,171 +22989,34 @@ index 0000000..3fbe653
 +          break;
 +
 +        case AMDGPU::SI_LOOP:
++          ++Depth;
 +          Loop(MI);
 +          break;
 +
-+        case AMDGPU::SI_END_CF:
-+          HaveCf = true;
-+          EndCf(MI);
-+          break;
-+
-+        case AMDGPU::S_BRANCH:
-+          Branch(MI);
-+          break;
-+      }
-+    }
-+  }
-+
-+  // TODO: What is this good for?
-+  unsigned ShaderType = MF.getInfo<SIMachineFunctionInfo>()->ShaderType;
-+  if (HaveCf && ShaderType == ShaderType::PIXEL) {
-+    for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
-+         BI != BE; ++BI) {
-+
-+      MachineBasicBlock &MBB = *BI;
-+      if (MBB.succ_empty()) {
-+
-+        MachineInstr &MI = *MBB.getFirstNonPHI();
-+        DebugLoc DL = MI.getDebugLoc();
-+
-+        // If the exec mask is non-zero, skip the next two instructions
-+        BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_CBRANCH_EXECNZ))
-+               .addImm(3)
-+               .addReg(AMDGPU::EXEC);
-+
-+        // Exec mask is zero: Export to NULL target...
-+        BuildMI(MBB, &MI, DL, TII->get(AMDGPU::EXP))
-+                .addImm(0)
-+                .addImm(0x09) // V_008DFC_SQ_EXP_NULL
-+                .addImm(0)
-+                .addImm(1)
-+                .addImm(1)
-+                .addReg(AMDGPU::SREG_LIT_0)
-+                .addReg(AMDGPU::SREG_LIT_0)
-+                .addReg(AMDGPU::SREG_LIT_0)
-+                .addReg(AMDGPU::SREG_LIT_0);
-+
-+        // ... and terminate wavefront
-+        BuildMI(MBB, &MI, DL, TII->get(AMDGPU::S_ENDPGM));
-+      }
-+    }
-+  }
-+
-+  return true;
-+}
-diff --git a/lib/Target/R600/SILowerLiteralConstants.cpp b/lib/Target/R600/SILowerLiteralConstants.cpp
-new file mode 100644
-index 0000000..c0411e9
---- /dev/null
-+++ b/lib/Target/R600/SILowerLiteralConstants.cpp
-@@ -0,0 +1,108 @@
-+//===-- SILowerLiteralConstants.cpp - Lower intrs using literal constants--===//
-+//
-+//                     The LLVM Compiler Infrastructure
-+//
-+// This file is distributed under the University of Illinois Open Source
-+// License. See LICENSE.TXT for details.
-+//
-+//===----------------------------------------------------------------------===//
-+//
-+/// \file
-+/// \brief This pass performs the following transformation on instructions with
-+/// literal constants:
-+///
-+/// %VGPR0 = V_MOV_IMM_I32 1
-+///
-+/// becomes:
-+///
-+/// BUNDLE
-+///   * %VGPR = V_MOV_B32_32 SI_LITERAL_CONSTANT
-+///   * SI_LOAD_LITERAL 1
-+///
-+/// The resulting sequence matches exactly how the hardware handles immediate
-+/// operands, so this transformation greatly simplifies the code generator.
-+///
-+/// Only the *_MOV_IMM_* support immediate operands at the moment, but when
-+/// support for immediate operands is added to other instructions, they
-+/// will be lowered here as well.
-+//===----------------------------------------------------------------------===//
-+
-+#include "AMDGPU.h"
-+#include "llvm/CodeGen/MachineFunction.h"
-+#include "llvm/CodeGen/MachineFunctionPass.h"
-+#include "llvm/CodeGen/MachineInstrBuilder.h"
-+#include "llvm/CodeGen/MachineInstrBundle.h"
-+
-+using namespace llvm;
-+
-+namespace {
-+
-+class SILowerLiteralConstantsPass : public MachineFunctionPass {
-+
-+private:
-+  static char ID;
-+  const TargetInstrInfo *TII;
-+
-+public:
-+  SILowerLiteralConstantsPass(TargetMachine &tm) :
-+    MachineFunctionPass(ID), TII(tm.getInstrInfo()) { }
-+
-+  virtual bool runOnMachineFunction(MachineFunction &MF);
-+
-+  const char *getPassName() const {
-+    return "SI Lower literal constants pass";
-+  }
-+};
-+
-+} // End anonymous namespace
-+
-+char SILowerLiteralConstantsPass::ID = 0;
-+
-+FunctionPass *llvm::createSILowerLiteralConstantsPass(TargetMachine &tm) {
-+  return new SILowerLiteralConstantsPass(tm);
-+}
-+
-+bool SILowerLiteralConstantsPass::runOnMachineFunction(MachineFunction &MF) {
-+  for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
-+                                                  BB != BB_E; ++BB) {
-+    MachineBasicBlock &MBB = *BB;
-+    for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I);
-+                               I != MBB.end(); I = Next) {
-+      Next = llvm::next(I);
-+      MachineInstr &MI = *I;
-+      switch (MI.getOpcode()) {
-+      default: break;
-+      case AMDGPU::S_MOV_IMM_I32:
-+      case AMDGPU::S_MOV_IMM_I64:
-+      case AMDGPU::V_MOV_IMM_F32:
-+      case AMDGPU::V_MOV_IMM_I32: {
-+          unsigned MovOpcode;
-+          unsigned LoadLiteralOpcode;
-+          MachineOperand LiteralOp = MI.getOperand(1);
-+          if (AMDGPU::VReg_32RegClass.contains(MI.getOperand(0).getReg())) {
-+            MovOpcode = AMDGPU::V_MOV_B32_e32;
-+          } else {
-+            MovOpcode = AMDGPU::S_MOV_B32;
-+          }
-+          if (LiteralOp.isImm()) {
-+            LoadLiteralOpcode = AMDGPU::SI_LOAD_LITERAL_I32;
-+          } else {
-+            LoadLiteralOpcode = AMDGPU::SI_LOAD_LITERAL_F32;
++        case AMDGPU::SI_END_CF:
++          if (--Depth == 0 && HaveKill) {
++            SkipIfDead(MI);
++            HaveKill = false;
 +          }
-+          MIBundleBuilder Bundle(MBB, I);
-+          Bundle
-+            .append(BuildMI(MF, MBB.findDebugLoc(I), TII->get(MovOpcode),
-+                            MI.getOperand(0).getReg())
-+                    .addReg(AMDGPU::SI_LITERAL_CONSTANT))
-+            .append(BuildMI(MF, MBB.findDebugLoc(I),
-+                            TII->get(LoadLiteralOpcode))
-+                    .addOperand(MI.getOperand(1)));
-+          llvm::finalizeBundle(MBB, Bundle.begin());
-+          MI.eraseFromParent();
++          EndCf(MI);
++          break;
++
++        case AMDGPU::SI_KILL:
++          if (Depth == 0)
++            SkipIfDead(MI);
++          else
++            HaveKill = true;
++          Kill(MI);
++          break;
++
++        case AMDGPU::S_BRANCH:
++          Branch(MI);
 +          break;
-+        }
 +      }
 +    }
 +  }
-+  return false;
++
++  return true;
 +}
 diff --git a/lib/Target/R600/SIMachineFunctionInfo.cpp b/lib/Target/R600/SIMachineFunctionInfo.cpp
 new file mode 100644
@@ -21589,24 +23193,10 @@ index 0000000..40171e4
 +#endif // SIREGISTERINFO_H_
 diff --git a/lib/Target/R600/SIRegisterInfo.td b/lib/Target/R600/SIRegisterInfo.td
 new file mode 100644
-index 0000000..c3f1361
+index 0000000..ab36b87
 --- /dev/null
 +++ b/lib/Target/R600/SIRegisterInfo.td
-@@ -0,0 +1,167 @@
-+
-+let Namespace = "AMDGPU" in {
-+  def low : SubRegIndex;
-+  def high : SubRegIndex;
-+
-+  def sub0 : SubRegIndex;
-+  def sub1 : SubRegIndex;
-+  def sub2 : SubRegIndex;
-+  def sub3 : SubRegIndex;
-+  def sub4 : SubRegIndex;
-+  def sub5 : SubRegIndex;
-+  def sub6 : SubRegIndex;
-+  def sub7 : SubRegIndex;
-+}
+@@ -0,0 +1,190 @@
 +
 +class SIReg <string n, bits<16> encoding = 0> : Register<n> {
 +  let Namespace = "AMDGPU";
@@ -21615,13 +23205,15 @@ index 0000000..c3f1361
 +
 +class SI_64 <string n, list<Register> subregs, bits<16> encoding> : RegisterWithSubRegs<n, subregs> {
 +  let Namespace = "AMDGPU";
-+  let SubRegIndices = [low, high];
++  let SubRegIndices = [sub0, sub1];
 +  let HWEncoding = encoding;
 +}
 +
 +class SGPR_32 <bits<16> num, string name> : SIReg<name, num>;
 +
-+class VGPR_32 <bits<16> num, string name> : SIReg<name, num>;
++class VGPR_32 <bits<16> num, string name> : SIReg<name, num> {
++  let HWEncoding{8} = 1;
++}
 +
 +// Special Registers
 +def VCC : SIReg<"VCC", 106>;
@@ -21629,8 +23221,6 @@ index 0000000..c3f1361
 +def EXEC_HI : SIReg <"EXEC HI", 127>;
 +def EXEC : SI_64<"EXEC", [EXEC_LO, EXEC_HI], 126>;
 +def SCC : SIReg<"SCC", 253>;
-+def SREG_LIT_0 : SIReg <"S LIT 0", 128>;
-+def SI_LITERAL_CONSTANT : SIReg<"LITERAL CONSTANT", 255>;
 +def M0 : SIReg <"M0", 124>;
 +
 +//Interpolation registers
@@ -21668,12 +23258,12 @@ index 0000000..c3f1361
 +                            (add (sequence "SGPR%u", 0, 101))>;
 +
 +// SGPR 64-bit registers
-+def SGPR_64 : RegisterTuples<[low, high],
++def SGPR_64 : RegisterTuples<[sub0, sub1],
 +                             [(add (decimate SGPR_32, 2)),
 +                              (add(decimate (rotl SGPR_32, 1), 2))]>;
 +
 +// SGPR 128-bit registers
-+def SGPR_128 : RegisterTuples<[sel_x, sel_y, sel_z, sel_w],
++def SGPR_128 : RegisterTuples<[sub0, sub1, sub2, sub3],
 +                              [(add (decimate SGPR_32, 4)),
 +                               (add (decimate (rotl SGPR_32, 1), 4)),
 +                               (add (decimate (rotl SGPR_32, 2), 4)),
@@ -21699,32 +23289,61 @@ index 0000000..c3f1361
 +                            (add (sequence "VGPR%u", 0, 255))>;
 +
 +// VGPR 64-bit registers
-+def VGPR_64 : RegisterTuples<[low, high],
++def VGPR_64 : RegisterTuples<[sub0, sub1],
 +                             [(add VGPR_32),
 +                              (add (rotl VGPR_32, 1))]>;
 +
 +// VGPR 128-bit registers
-+def VGPR_128 : RegisterTuples<[sel_x, sel_y, sel_z, sel_w],
++def VGPR_128 : RegisterTuples<[sub0, sub1, sub2, sub3],
 +                              [(add VGPR_32),
 +                               (add (rotl VGPR_32, 1)),
 +                               (add (rotl VGPR_32, 2)),
 +                               (add (rotl VGPR_32, 3))]>;
 +
++// VGPR 256-bit registers
++def VGPR_256 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7],
++                              [(add VGPR_32),
++                               (add (rotl VGPR_32, 1)),
++                               (add (rotl VGPR_32, 2)),
++                               (add (rotl VGPR_32, 3)),
++                               (add (rotl VGPR_32, 4)),
++                               (add (rotl VGPR_32, 5)),
++                               (add (rotl VGPR_32, 6)),
++                               (add (rotl VGPR_32, 7))]>;
++
++// VGPR 512-bit registers
++def VGPR_512 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7,
++                               sub8, sub9, sub10, sub11, sub12, sub13, sub14, sub15],
++                              [(add VGPR_32),
++                               (add (rotl VGPR_32, 1)),
++                               (add (rotl VGPR_32, 2)),
++                               (add (rotl VGPR_32, 3)),
++                               (add (rotl VGPR_32, 4)),
++                               (add (rotl VGPR_32, 5)),
++                               (add (rotl VGPR_32, 6)),
++                               (add (rotl VGPR_32, 7)),
++                               (add (rotl VGPR_32, 8)),
++                               (add (rotl VGPR_32, 9)),
++                               (add (rotl VGPR_32, 10)),
++                               (add (rotl VGPR_32, 11)),
++                               (add (rotl VGPR_32, 12)),
++                               (add (rotl VGPR_32, 13)),
++                               (add (rotl VGPR_32, 14)),
++                               (add (rotl VGPR_32, 15))]>;
++
 +// Register class for all scalar registers (SGPRs + Special Registers)
 +def SReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32,
-+    (add SGPR_32,  SREG_LIT_0, M0, EXEC_LO, EXEC_HI)
++    (add SGPR_32, M0, EXEC_LO, EXEC_HI)
 +>;
 +
-+def SReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add SGPR_64, VCC, EXEC)>;
-+
-+def SReg_1 : RegisterClass<"AMDGPU", [i1], 1, (add VCC, SGPR_64, EXEC)>;
++def SReg_64 : RegisterClass<"AMDGPU", [i1, i64], 64, (add SGPR_64, VCC, EXEC)>;
 +
 +def SReg_128 : RegisterClass<"AMDGPU", [v4f32, v4i32], 128, (add SGPR_128)>;
 +
 +def SReg_256 : RegisterClass<"AMDGPU", [v8i32], 256, (add SGPR_256)>;
 +
 +// Register class for all vector registers (VGPRs + Interploation Registers)
-+def VReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32,
++def VReg_32 : RegisterClass<"AMDGPU", [f32, i32, v1i32], 32,
 +    (add VGPR_32,
 +    PERSP_SAMPLE_I, PERSP_SAMPLE_J,
 +    PERSP_CENTER_I, PERSP_CENTER_J,
@@ -21745,14 +23364,22 @@ index 0000000..c3f1361
 +    )
 +>;
 +
-+def VReg_64 : RegisterClass<"AMDGPU", [i64], 64, (add VGPR_64)>;
++def VReg_64 : RegisterClass<"AMDGPU", [i64, v2i32], 64, (add VGPR_64)>;
++
++def VReg_128 : RegisterClass<"AMDGPU", [v4f32, v4i32], 128, (add VGPR_128)>;
++
++def VReg_256 : RegisterClass<"AMDGPU", [v8i32], 256, (add VGPR_256)>;
 +
-+def VReg_128 : RegisterClass<"AMDGPU", [v4f32], 128, (add VGPR_128)>;
++def VReg_512 : RegisterClass<"AMDGPU", [v16i32], 512, (add VGPR_512)>;
 +
-+// AllReg_* - A set of all scalar and vector registers of a given width.
-+def AllReg_32 : RegisterClass<"AMDGPU", [f32, i32], 32, (add VReg_32, SReg_32)>;
++// [SV]Src_* operands can have either an immediate or an register
++def SSrc_32 : RegisterClass<"AMDGPU", [i32, f32], 32, (add SReg_32)>;
 +
-+def AllReg_64 : RegisterClass<"AMDGPU", [f64, i64], 64, (add SReg_64, VReg_64)>;
++def SSrc_64 : RegisterClass<"AMDGPU", [i1, i64], 64, (add SReg_64)>;
++
++def VSrc_32 : RegisterClass<"AMDGPU", [i32, f32], 32, (add VReg_32, SReg_32)>;
++
++def VSrc_64 : RegisterClass<"AMDGPU", [i64], 64, (add SReg_64, VReg_64)>;
 +
 +// Special register classes for predicates and the M0 register
 +def SCCReg : RegisterClass<"AMDGPU", [i1], 1, (add SCC)>;
@@ -21876,6 +23503,30 @@ index 0000000..b8ac4e7
 +CPPFLAGS = -I$(PROJ_OBJ_DIR)/.. -I$(PROJ_SRC_DIR)/..
 +
 +include $(LEVEL)/Makefile.common
+diff --git a/test/CodeGen/R600/128bit-kernel-args.ll b/test/CodeGen/R600/128bit-kernel-args.ll
+new file mode 100644
+index 0000000..114f9e7
+--- /dev/null
++++ b/test/CodeGen/R600/128bit-kernel-args.ll
+@@ -0,0 +1,18 @@
++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; CHECK: @v4i32_kernel_arg
++; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
++
++define void @v4i32_kernel_arg(<4 x i32> addrspace(1)* %out, <4 x i32>  %in) {
++entry:
++  store <4 x i32> %in, <4 x i32> addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @v4f32_kernel_arg
++; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
++define void @v4f32_kernel_args(<4 x float> addrspace(1)* %out, <4 x float>  %in) {
++entry:
++  store <4 x float> %in, <4 x float> addrspace(1)* %out
++  ret void
++}
 diff --git a/test/CodeGen/R600/add.v4i32.ll b/test/CodeGen/R600/add.v4i32.ll
 new file mode 100644
 index 0000000..ac4a874
@@ -21918,6 +23569,82 @@ index 0000000..662085e
 +  store <4 x i32> %result, <4 x i32> addrspace(1)* %out
 +  ret void
 +}
+diff --git a/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll b/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll
+new file mode 100644
+index 0000000..fd958b3
+--- /dev/null
++++ b/test/CodeGen/R600/dagcombiner-bug-illegal-vec4-int-to-fp.ll
+@@ -0,0 +1,36 @@
++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; This test is for a bug in
++; DAGCombiner::reduceBuildVecConvertToConvertBuildVec() where
++; the wrong type was being passed to
++; TargetLowering::getOperationAction() when checking the legality of
++; ISD::UINT_TO_FP and ISD::SINT_TO_FP opcodes.
++
++
++; CHECK: @sint
++; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
++
++define void @sint(<4 x float> addrspace(1)* %out, i32 addrspace(1)* %in) {
++entry:
++  %ptr = getelementptr i32 addrspace(1)* %in, i32 1
++  %sint = load i32 addrspace(1) * %in
++  %conv = sitofp i32 %sint to float
++  %0 = insertelement <4 x float> undef, float %conv, i32 0
++  %splat = shufflevector <4 x float> %0, <4 x float> undef, <4 x i32> zeroinitializer
++  store <4 x float> %splat, <4 x float> addrspace(1)* %out
++  ret void
++}
++
++;CHECK: @uint
++;CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
++
++define void @uint(<4 x float> addrspace(1)* %out, i32 addrspace(1)* %in) {
++entry:
++  %ptr = getelementptr i32 addrspace(1)* %in, i32 1
++  %uint = load i32 addrspace(1) * %in
++  %conv = uitofp i32 %uint to float
++  %0 = insertelement <4 x float> undef, float %conv, i32 0
++  %splat = shufflevector <4 x float> %0, <4 x float> undef, <4 x i32> zeroinitializer
++  store <4 x float> %splat, <4 x float> addrspace(1)* %out
++  ret void
++}
+diff --git a/test/CodeGen/R600/disconnected-predset-break-bug.ll b/test/CodeGen/R600/disconnected-predset-break-bug.ll
+new file mode 100644
+index 0000000..a586742
+--- /dev/null
++++ b/test/CodeGen/R600/disconnected-predset-break-bug.ll
+@@ -0,0 +1,28 @@
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; PRED_SET* instructions must be tied to any instruction that uses their
++; result.  This tests that there are no instructions between the PRED_SET*
++; and the PREDICATE_BREAK in this loop.
++
++; CHECK: @loop_ge
++; CHECK: WHILE
++; CHECK: PRED_SET
++; CHECK-NEXT: PREDICATED_BREAK
++define void @loop_ge(i32 addrspace(1)* nocapture %out, i32 %iterations) nounwind {
++entry:
++  %cmp5 = icmp sgt i32 %iterations, 0
++  br i1 %cmp5, label %for.body, label %for.end
++
++for.body:                                         ; preds = %for.body, %entry
++  %i.07.in = phi i32 [ %i.07, %for.body ], [ %iterations, %entry ]
++  %ai.06 = phi i32 [ %add, %for.body ], [ 0, %entry ]
++  %i.07 = add nsw i32 %i.07.in, -1
++  %arrayidx = getelementptr inbounds i32 addrspace(1)* %out, i32 %ai.06
++  store i32 %i.07, i32 addrspace(1)* %arrayidx, align 4
++  %add = add nsw i32 %ai.06, 1
++  %exitcond = icmp eq i32 %add, %iterations
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:                                          ; preds = %for.body, %entry
++  ret void
++}
 diff --git a/test/CodeGen/R600/fabs.ll b/test/CodeGen/R600/fabs.ll
 new file mode 100644
 index 0000000..0407533
@@ -22027,15 +23754,13 @@ index 0000000..5c981ef
 +}
 diff --git a/test/CodeGen/R600/fcmp.ll b/test/CodeGen/R600/fcmp.ll
 new file mode 100644
-index 0000000..1dcd07c
+index 0000000..89f5e9e
 --- /dev/null
 +++ b/test/CodeGen/R600/fcmp.ll
-@@ -0,0 +1,16 @@
+@@ -0,0 +1,14 @@
 +;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 +
-+;CHECK: SETE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-+;CHECK: MOV T{{[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}}
-+;CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
++;CHECK: SETE_DX10 T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 +
 +define void @test(i32 addrspace(1)* %out, float addrspace(1)* %in) {
 +entry:
@@ -22183,14 +23908,13 @@ index 0000000..6d44a0c
 +}
 diff --git a/test/CodeGen/R600/fsub.ll b/test/CodeGen/R600/fsub.ll
 new file mode 100644
-index 0000000..0ec1c37
+index 0000000..591aa52
 --- /dev/null
 +++ b/test/CodeGen/R600/fsub.ll
-@@ -0,0 +1,17 @@
+@@ -0,0 +1,16 @@
 +;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 +
-+; CHECK: MOV T{{[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}}
-+; CHECK: ADD T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
++; CHECK: ADD T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}}
 +
 +define void @test() {
 +   %r0 = call float @llvm.R600.load.input(i32 0)
@@ -22266,6 +23990,64 @@ index 0000000..aad44d9
 +  store i32 %value, i32 addrspace(1)* %out
 +  ret void
 +}
+diff --git a/test/CodeGen/R600/kcache-fold.ll b/test/CodeGen/R600/kcache-fold.ll
+new file mode 100644
+index 0000000..382f78c
+--- /dev/null
++++ b/test/CodeGen/R600/kcache-fold.ll
+@@ -0,0 +1,52 @@
++;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; CHECK: MOV T{{[0-9]+\.[XYZW], CBuf0\[[0-9]+\]\.[XYZW]}}
++
++define void @main() {
++main_body:
++  %0 = load <4 x float> addrspace(9)* null
++  %1 = extractelement <4 x float> %0, i32 0
++  %2 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
++  %3 = extractelement <4 x float> %2, i32 0
++  %4 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
++  %5 = extractelement <4 x float> %4, i32 0
++  %6 = fcmp ult float %1, 0.000000e+00
++  %7 = select i1 %6, float %3, float %5
++  %8 = load <4 x float> addrspace(9)* null
++  %9 = extractelement <4 x float> %8, i32 1
++  %10 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
++  %11 = extractelement <4 x float> %10, i32 1
++  %12 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
++  %13 = extractelement <4 x float> %12, i32 1
++  %14 = fcmp ult float %9, 0.000000e+00
++  %15 = select i1 %14, float %11, float %13
++  %16 = load <4 x float> addrspace(9)* null
++  %17 = extractelement <4 x float> %16, i32 2
++  %18 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
++  %19 = extractelement <4 x float> %18, i32 2
++  %20 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
++  %21 = extractelement <4 x float> %20, i32 2
++  %22 = fcmp ult float %17, 0.000000e+00
++  %23 = select i1 %22, float %19, float %21
++  %24 = load <4 x float> addrspace(9)* null
++  %25 = extractelement <4 x float> %24, i32 3
++  %26 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 1)
++  %27 = extractelement <4 x float> %26, i32 3
++  %28 = load <4 x float> addrspace(9)* getelementptr ([1024 x <4 x float>] addrspace(9)* null, i64 0, i32 2)
++  %29 = extractelement <4 x float> %28, i32 3
++  %30 = fcmp ult float %25, 0.000000e+00
++  %31 = select i1 %30, float %27, float %29
++  %32 = call float @llvm.AMDIL.clamp.(float %7, float 0.000000e+00, float 1.000000e+00)
++  %33 = call float @llvm.AMDIL.clamp.(float %15, float 0.000000e+00, float 1.000000e+00)
++  %34 = call float @llvm.AMDIL.clamp.(float %23, float 0.000000e+00, float 1.000000e+00)
++  %35 = call float @llvm.AMDIL.clamp.(float %31, float 0.000000e+00, float 1.000000e+00)
++  %36 = insertelement <4 x float> undef, float %32, i32 0
++  %37 = insertelement <4 x float> %36, float %33, i32 1
++  %38 = insertelement <4 x float> %37, float %34, i32 2
++  %39 = insertelement <4 x float> %38, float %35, i32 3
++  call void @llvm.R600.store.swizzle(<4 x float> %39, i32 0, i32 0)
++  ret void
++}
++
++declare float @llvm.AMDIL.clamp.(float, float, float) readnone
++declare void @llvm.R600.store.swizzle(<4 x float>, i32, i32)
 diff --git a/test/CodeGen/R600/lit.local.cfg b/test/CodeGen/R600/lit.local.cfg
 new file mode 100644
 index 0000000..36ee493
@@ -22287,10 +24069,10 @@ index 0000000..36ee493
 +
 diff --git a/test/CodeGen/R600/literals.ll b/test/CodeGen/R600/literals.ll
 new file mode 100644
-index 0000000..4c731b2
+index 0000000..be62342
 --- /dev/null
 +++ b/test/CodeGen/R600/literals.ll
-@@ -0,0 +1,30 @@
+@@ -0,0 +1,32 @@
 +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 +
 +; Test using an integer literal constant.
@@ -22299,6 +24081,7 @@ index 0000000..4c731b2
 +; or
 +; ADD_INT literal.x REG, 5
 +
++; CHECK; @i32_literal
 +; CHECK: ADD_INT {{[A-Z0-9,. ]*}}literal.x,{{[A-Z0-9,. ]*}} 5
 +define void @i32_literal(i32 addrspace(1)* %out, i32 %in) {
 +entry:
@@ -22313,6 +24096,7 @@ index 0000000..4c731b2
 +; or
 +; ADD literal.x REG, 5.0
 +
++; CHECK: @float_literal
 +; CHECK: ADD {{[A-Z0-9,. ]*}}literal.x,{{[A-Z0-9,. ]*}} {{[0-9]+}}(5.0
 +define void @float_literal(float addrspace(1)* %out, float %in) {
 +entry:
@@ -22366,6 +24150,35 @@ index 0000000..fac957f
 +declare void @llvm.AMDGPU.store.output(float, i32)
 +
 +declare float @llvm.AMDGPU.trunc(float ) readnone
+diff --git a/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll b/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll
+new file mode 100644
+index 0000000..0c19f14
+--- /dev/null
++++ b/test/CodeGen/R600/llvm.SI.fs.interp.constant.ll
+@@ -0,0 +1,23 @@
++;RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s
++
++;CHECK: S_MOV_B32
++;CHECK-NEXT: V_INTERP_MOV_F32
++
++define void @main() {
++main_body:
++  call void @llvm.AMDGPU.shader.type(i32 0)
++  %0 = load i32 addrspace(8)* inttoptr (i32 6 to i32 addrspace(8)*)
++  %1 = call float @llvm.SI.fs.interp.constant(i32 0, i32 0, i32 %0)
++  %2 = call i32 @llvm.SI.packf16(float %1, float %1)
++  %3 = bitcast i32 %2 to float
++  call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %3, float %3, float %3, float %3)
++  ret void
++}
++
++declare void @llvm.AMDGPU.shader.type(i32)
++
++declare float @llvm.SI.fs.interp.constant(i32, i32, i32) readonly
++
++declare i32 @llvm.SI.packf16(float, float) readnone
++
++declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)
 diff --git a/test/CodeGen/R600/llvm.cos.ll b/test/CodeGen/R600/llvm.cos.ll
 new file mode 100644
 index 0000000..dc120bf
@@ -22466,6 +24279,112 @@ index 0000000..b070dcd
 +  store i32 %2, i32 addrspace(1)* %out
 +  ret void
 +}
+diff --git a/test/CodeGen/R600/predicates.ll b/test/CodeGen/R600/predicates.ll
+new file mode 100644
+index 0000000..18895a4
+--- /dev/null
++++ b/test/CodeGen/R600/predicates.ll
+@@ -0,0 +1,100 @@
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; These tests make sure the compiler is optimizing branches using predicates
++; when it is legal to do so.
++
++; CHECK: @simple_if
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
++; CHECK: LSHL T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++define void @simple_if(i32 addrspace(1)* %out, i32 %in) {
++entry:
++  %0 = icmp sgt i32 %in, 0
++  br i1 %0, label %IF, label %ENDIF
++
++IF:
++  %1 = shl i32 %in, 1
++  br label %ENDIF
++
++ENDIF:
++  %2 = phi i32 [ %in, %entry ], [ %1, %IF ]
++  store i32 %2, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @simple_if_else
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++define void @simple_if_else(i32 addrspace(1)* %out, i32 %in) {
++entry:
++  %0 = icmp sgt i32 %in, 0
++  br i1 %0, label %IF, label %ELSE
++
++IF:
++  %1 = shl i32 %in, 1
++  br label %ENDIF
++
++ELSE:
++  %2 = lshr i32 %in, 1
++  br label %ENDIF
++
++ENDIF:
++  %3 = phi i32 [ %1, %IF ], [ %2, %ELSE ]
++  store i32 %3, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @nested_if
++; CHECK: IF_PREDICATE_SET
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
++; CHECK: LSHL T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++; CHECK: ENDIF
++define void @nested_if(i32 addrspace(1)* %out, i32 %in) {
++entry:
++  %0 = icmp sgt i32 %in, 0
++  br i1 %0, label %IF0, label %ENDIF
++
++IF0:
++  %1 = add i32 %in, 10
++  %2 = icmp sgt i32 %1, 0
++  br i1 %2, label %IF1, label %ENDIF
++
++IF1:
++  %3 = shl i32  %1, 1
++  br label %ENDIF
++
++ENDIF:
++  %4 = phi i32 [%in, %entry], [%1, %IF0], [%3, %IF1]
++  store i32 %4, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @nested_if_else
++; CHECK: IF_PREDICATE_SET
++; CHECK: PRED_SET{{[EGN][ET]*}}_INT Pred,
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++; CHECK: LSH{{[LR] T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, 1, 0(0.000000e+00) Pred_sel
++; CHECK: ENDIF
++define void @nested_if_else(i32 addrspace(1)* %out, i32 %in) {
++entry:
++  %0 = icmp sgt i32 %in, 0
++  br i1 %0, label %IF0, label %ENDIF
++
++IF0:
++  %1 = add i32 %in, 10
++  %2 = icmp sgt i32 %1, 0
++  br i1 %2, label %IF1, label %ELSE1
++
++IF1:
++  %3 = shl i32  %1, 1
++  br label %ENDIF
++
++ELSE1:
++  %4 = lshr i32 %in, 1
++  br label %ENDIF
++
++ENDIF:
++  %5 = phi i32 [%in, %entry], [%3, %IF1], [%4, %ELSE1]
++  store i32 %5, i32 addrspace(1)* %out
++  ret void
++}
 diff --git a/test/CodeGen/R600/reciprocal.ll b/test/CodeGen/R600/reciprocal.ll
 new file mode 100644
 index 0000000..6838c1a
@@ -22517,7 +24436,7 @@ index 0000000..3556fac
 +}
 diff --git a/test/CodeGen/R600/selectcc-icmp-select-float.ll b/test/CodeGen/R600/selectcc-icmp-select-float.ll
 new file mode 100644
-index 0000000..f65a300
+index 0000000..359ca1e
 --- /dev/null
 +++ b/test/CodeGen/R600/selectcc-icmp-select-float.ll
 @@ -0,0 +1,15 @@
@@ -22525,7 +24444,7 @@ index 0000000..f65a300
 +
 +; Note additional optimizations may cause this SGT to be replaced with a
 +; CND* instruction.
-+; CHECK: SGT_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal.x, -1}}
++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal.x, -1}}
 +; Test a selectcc with i32 LHS/RHS and float True/False
 +
 +define void @test(float addrspace(1)* %out, i32 addrspace(1)* %in) {
@@ -22570,6 +24489,149 @@ index 0000000..b38078e
 +  store i32 %3, i32 addrspace(1)* %out
 +  ret void
 +}
+diff --git a/test/CodeGen/R600/set-dx10.ll b/test/CodeGen/R600/set-dx10.ll
+new file mode 100644
+index 0000000..54febcf
+--- /dev/null
++++ b/test/CodeGen/R600/set-dx10.ll
+@@ -0,0 +1,137 @@
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; These tests check that floating point comparisons which are used by select
++; to store integer true (-1) and false (0) values are lowered to one of the
++; SET*DX10 instructions.
++
++; CHECK: @fcmp_une_select_fptosi
++; CHECK: SETNE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_une_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp une float %in, 5.0
++  %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++  %2 = fsub float -0.000000e+00, %1
++  %3 = fptosi float %2 to i32
++  store i32 %3, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @fcmp_une_select_i32
++; CHECK: SETNE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_une_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp une float %in, 5.0
++  %1 = select i1 %0, i32 -1, i32 0
++  store i32 %1, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @fcmp_ueq_select_fptosi
++; CHECK: SETE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_ueq_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp ueq float %in, 5.0
++  %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++  %2 = fsub float -0.000000e+00, %1
++  %3 = fptosi float %2 to i32
++  store i32 %3, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @fcmp_ueq_select_i32
++; CHECK: SETE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_ueq_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp ueq float %in, 5.0
++  %1 = select i1 %0, i32 -1, i32 0
++  store i32 %1, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @fcmp_ugt_select_fptosi
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_ugt_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp ugt float %in, 5.0
++  %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++  %2 = fsub float -0.000000e+00, %1
++  %3 = fptosi float %2 to i32
++  store i32 %3, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @fcmp_ugt_select_i32
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_ugt_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp ugt float %in, 5.0
++  %1 = select i1 %0, i32 -1, i32 0
++  store i32 %1, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @fcmp_uge_select_fptosi
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_uge_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp uge float %in, 5.0
++  %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++  %2 = fsub float -0.000000e+00, %1
++  %3 = fptosi float %2 to i32
++  store i32 %3, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @fcmp_uge_select_i32
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, T{{[0-9]+\.[XYZW]}}, literal.x, 1084227584(5.000000e+00)
++define void @fcmp_uge_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp uge float %in, 5.0
++  %1 = select i1 %0, i32 -1, i32 0
++  store i32 %1, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @fcmp_ule_select_fptosi
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @fcmp_ule_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp ule float %in, 5.0
++  %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++  %2 = fsub float -0.000000e+00, %1
++  %3 = fptosi float %2 to i32
++  store i32 %3, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @fcmp_ule_select_i32
++; CHECK: SETGE_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @fcmp_ule_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp ule float %in, 5.0
++  %1 = select i1 %0, i32 -1, i32 0
++  store i32 %1, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @fcmp_ult_select_fptosi
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @fcmp_ult_select_fptosi(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp ult float %in, 5.0
++  %1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
++  %2 = fsub float -0.000000e+00, %1
++  %3 = fptosi float %2 to i32
++  store i32 %3, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @fcmp_ult_select_i32
++; CHECK: SETGT_DX10 T{{[0-9]+\.[XYZW]}}, literal.x, T{{[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @fcmp_ult_select_i32(i32 addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp ult float %in, 5.0
++  %1 = select i1 %0, i32 -1, i32 0
++  store i32 %1, i32 addrspace(1)* %out
++  ret void
++}
 diff --git a/test/CodeGen/R600/setcc.v4i32.ll b/test/CodeGen/R600/setcc.v4i32.ll
 new file mode 100644
 index 0000000..0752f2e
@@ -22590,12 +24652,13 @@ index 0000000..0752f2e
 +}
 diff --git a/test/CodeGen/R600/short-args.ll b/test/CodeGen/R600/short-args.ll
 new file mode 100644
-index 0000000..1070250
+index 0000000..b69e327
 --- /dev/null
 +++ b/test/CodeGen/R600/short-args.ll
-@@ -0,0 +1,37 @@
+@@ -0,0 +1,41 @@
 +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 +
++; CHECK: @i8_arg
 +; CHECK: VTX_READ_8 T{{[0-9]+\.X, T[0-9]+\.X}}
 +
 +define void @i8_arg(i32 addrspace(1)* nocapture %out, i8 %in) nounwind {
@@ -22605,6 +24668,7 @@ index 0000000..1070250
 +  ret void
 +}
 +
++; CHECK: @i8_zext_arg
 +; CHECK: VTX_READ_8 T{{[0-9]+\.X, T[0-9]+\.X}}
 +
 +define void @i8_zext_arg(i32 addrspace(1)* nocapture %out, i8 zeroext %in) nounwind {
@@ -22614,6 +24678,7 @@ index 0000000..1070250
 +  ret void
 +}
 +
++; CHECK: @i16_arg
 +; CHECK: VTX_READ_16 T{{[0-9]+\.X, T[0-9]+\.X}}
 +
 +define void @i16_arg(i32 addrspace(1)* nocapture %out, i16 %in) nounwind {
@@ -22623,6 +24688,7 @@ index 0000000..1070250
 +  ret void
 +}
 +
++; CHECK: @i16_zext_arg
 +; CHECK: VTX_READ_16 T{{[0-9]+\.X, T[0-9]+\.X}}
 +
 +define void @i16_zext_arg(i32 addrspace(1)* nocapture %out, i16 zeroext %in) nounwind {
@@ -22682,6 +24748,95 @@ index 0000000..47657a6
 +  store <4 x i32> %result, <4 x i32> addrspace(1)* %out
 +  ret void
 +}
+diff --git a/test/CodeGen/R600/unsupported-cc.ll b/test/CodeGen/R600/unsupported-cc.ll
+new file mode 100644
+index 0000000..b48c591
+--- /dev/null
++++ b/test/CodeGen/R600/unsupported-cc.ll
+@@ -0,0 +1,83 @@
++; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
++
++; These tests are for condition codes that are not supported by the hardware
++
++; CHECK: @slt
++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 5(7.006492e-45)
++define void @slt(i32 addrspace(1)* %out, i32 %in) {
++entry:
++  %0 = icmp slt i32 %in, 5
++  %1 = select i1 %0, i32 -1, i32 0
++  store i32 %1, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @ult_i32
++; CHECK: SETGT_UINT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 5(7.006492e-45)
++define void @ult_i32(i32 addrspace(1)* %out, i32 %in) {
++entry:
++  %0 = icmp ult i32 %in, 5
++  %1 = select i1 %0, i32 -1, i32 0
++  store i32 %1, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @ult_float
++; CHECK: SETGT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @ult_float(float addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp ult float %in, 5.0
++  %1 = select i1 %0, float 1.0, float 0.0
++  store float %1, float addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @olt
++; CHECK: SETGT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @olt(float addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp olt float %in, 5.0
++  %1 = select i1 %0, float 1.0, float 0.0
++  store float %1, float addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @sle
++; CHECK: SETGT_INT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 6(8.407791e-45)
++define void @sle(i32 addrspace(1)* %out, i32 %in) {
++entry:
++  %0 = icmp sle i32 %in, 5
++  %1 = select i1 %0, i32 -1, i32 0
++  store i32 %1, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @ule_i32
++; CHECK: SETGT_UINT T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 6(8.407791e-45)
++define void @ule_i32(i32 addrspace(1)* %out, i32 %in) {
++entry:
++  %0 = icmp ule i32 %in, 5
++  %1 = select i1 %0, i32 -1, i32 0
++  store i32 %1, i32 addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @ule_float
++; CHECK: SETGE T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @ule_float(float addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp ule float %in, 5.0
++  %1 = select i1 %0, float 1.0, float 0.0
++  store float %1, float addrspace(1)* %out
++  ret void
++}
++
++; CHECK: @ole
++; CHECK: SETGE T{{[0-9]+\.[XYZW]}}, literal.x, {{T[0-9]+\.[XYZW]}}, 1084227584(5.000000e+00)
++define void @ole(float addrspace(1)* %out, float %in) {
++entry:
++  %0 = fcmp ole float %in, 5.0
++  %1 = select i1 %0, float 1.0, float 0.0
++  store float %1, float addrspace(1)* %out
++  ret void
++}
 diff --git a/test/CodeGen/R600/urem.v4i32.ll b/test/CodeGen/R600/urem.v4i32.ll
 new file mode 100644
 index 0000000..2e7388c
@@ -22705,15 +24860,13 @@ index 0000000..2e7388c
 +}
 diff --git a/test/CodeGen/R600/vec4-expand.ll b/test/CodeGen/R600/vec4-expand.ll
 new file mode 100644
-index 0000000..47cbf82
+index 0000000..8f62bc6
 --- /dev/null
 +++ b/test/CodeGen/R600/vec4-expand.ll
-@@ -0,0 +1,52 @@
-+; There are bugs in the DAGCombiner that prevent this test from passing.
-+; XFAIL: *
-+
+@@ -0,0 +1,53 @@
 +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 +
++; CHECK: @fp_to_sint
 +; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 +; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 +; CHECK: FLT_TO_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
@@ -22726,6 +24879,7 @@ index 0000000..47cbf82
 +  ret void
 +}
 +
++; CHECK: @fp_to_uint
 +; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 +; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 +; CHECK: FLT_TO_UINT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
@@ -22738,6 +24892,7 @@ index 0000000..47cbf82
 +  ret void
 +}
 +
++; CHECK: @sint_to_fp
 +; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 +; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 +; CHECK: INT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
@@ -22750,6 +24905,7 @@ index 0000000..47cbf82
 +  ret void
 +}
 +
++; CHECK: @uint_to_fp
 +; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 +; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 +; CHECK: UINT_TO_FLT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
@@ -22804,6 +24960,15 @@ index 0000000..62cdcf5
 +declare <4 x float> @llvm.SI.vs.load.input(<4 x i32>, i32, i32)
 +
 +declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)
--- 
-1.8.0.2
-
+diff --git a/test/CodeGen/X86/cvtv2f32.ll b/test/CodeGen/X86/cvtv2f32.ll
+index 466b096..d11bb9e 100644
+--- a/test/CodeGen/X86/cvtv2f32.ll
++++ b/test/CodeGen/X86/cvtv2f32.ll
+@@ -1,3 +1,7 @@
++; A bug fix in the DAGCombiner made this test fail, so marking as xfail
++; until this can be investigated further.
++; XFAIL: *
++
+ ; RUN: llc < %s -mtriple=i686-linux-pc -mcpu=corei7 | FileCheck %s
+ 
+ define <2 x float> @foo(i32 %x, i32 %y, <2 x float> %v) {

diff --git a/sys-devel/llvm/llvm-3.2.ebuild b/sys-devel/llvm/llvm-3.2.ebuild
index 7171bfc..ceb16bb 100644
--- a/sys-devel/llvm/llvm-3.2.ebuild
+++ b/sys-devel/llvm/llvm-3.2.ebuild
@@ -1,33 +1,38 @@
-# Copyright 1999-2012 Gentoo Foundation
+# Copyright 1999-2013 Gentoo Foundation
 # Distributed under the terms of the GNU General Public License v2
-# $Header: /var/cvsroot/gentoo-x86/sys-devel/llvm/llvm-3.2.ebuild,v 1.1 2012/12/21 09:18:12 voyageur Exp $
+# $Header: /var/cvsroot/gentoo-x86/sys-devel/llvm/llvm-3.2.ebuild,v 1.6 2013/02/27 06:02:15 zmedico Exp $
 
 EAPI=5
-PYTHON_DEPEND="2"
-inherit eutils flag-o-matic multilib toolchain-funcs python pax-utils
+
+# pypy gives me around 1700 unresolved tests due to open file limit
+# being exceeded. probably GC does not close them fast enough.
+PYTHON_COMPAT=( python{2_5,2_6,2_7} )
+
+inherit eutils flag-o-matic multilib python-any-r1 toolchain-funcs pax-utils
 
 DESCRIPTION="Low Level Virtual Machine"
 HOMEPAGE="http://llvm.org/"
-SRC_URI="http://llvm.org/releases/${PV}/${P}.src.tar.gz"
+SRC_URI="http://llvm.org/releases/${PV}/${P}.src.tar.gz
+	!doc? ( http://dev.gentoo.org/~voyageur/distfiles/${P}-manpages.tar.bz2 )"
 
 LICENSE="UoI-NCSA"
 SLOT="0"
-KEYWORDS="~amd64 ~arm ~ppc ~x86 ~amd64-fbsd ~x86-fbsd ~x64-freebsd ~amd64-linux ~x86-linux ~ppc-macos ~x64-macos"
+KEYWORDS="~amd64 ~arm ~ppc ~x86 ~amd64-fbsd ~x86-fbsd ~x64-freebsd ~amd64-linux ~arm-linux ~x86-linux ~ppc-macos ~x64-macos"
 IUSE="debug doc gold +libffi multitarget ocaml test udis86 vim-syntax"
 
 DEPEND="dev-lang/perl
-	dev-python/sphinx
 	>=sys-devel/make-3.79
 	>=sys-devel/flex-2.5.4
 	>=sys-devel/bison-1.875d
 	|| ( >=sys-devel/gcc-3.0 >=sys-devel/gcc-apple-4.2.1 )
 	|| ( >=sys-devel/binutils-2.18 >=sys-devel/binutils-apple-3.2.3 )
+	doc? ( dev-python/sphinx )
 	gold? ( >=sys-devel/binutils-2.22[cxx] )
 	libffi? ( virtual/pkgconfig
 		virtual/libffi )
 	ocaml? ( dev-lang/ocaml )
-	udis86? ( amd64? ( dev-libs/udis86[pic] )
-		!amd64? ( dev-libs/udis86 ) )"
+	udis86? ( dev-libs/udis86[pic(+)] )
+	${PYTHON_DEPS}"
 RDEPEND="dev-lang/perl
 	libffi? ( virtual/libffi )
 	vim-syntax? ( || ( app-editors/vim app-editors/gvim ) )"
@@ -36,8 +41,7 @@ S=${WORKDIR}/${P}.src
 
 pkg_setup() {
 	# Required for test and build
-	python_set_active_version 2
-	python_pkg_setup
+	python-any-r1_pkg_setup
 
 	# need to check if the active compiler is ok
 
@@ -64,12 +68,12 @@ pkg_setup() {
 
 	if [[ ${CHOST} == x86_64-* && ${broken_gcc_amd64} == *" ${version} "* ]];
 	then
-		 elog "Your version of gcc is known to miscompile llvm in amd64"
-		 elog "architectures.  Check"
-		 elog "http://www.llvm.org/docs/GettingStarted.html for possible"
-		 elog "solutions."
+		elog "Your version of gcc is known to miscompile llvm in amd64"
+		elog "architectures.  Check"
+		elog "http://www.llvm.org/docs/GettingStarted.html for possible"
+		elog "solutions."
 		die "Your currently active version of gcc is known to miscompile llvm"
-	 fi
+	fi
 }
 
 src_prepare() {
@@ -96,12 +100,9 @@ src_prepare() {
 	sed -e "/NO_INSTALL = 1/s/^/#/" -i utils/FileCheck/Makefile \
 		|| die "FileCheck Makefile sed failed"
 
-	# Specify python version
-	python_convert_shebangs -r 2 test/Scripts
-
 	epatch "${FILESDIR}"/${PN}-3.2-nodoctargz.patch
 	epatch "${FILESDIR}"/${PN}-3.0-PPC_macro.patch
-	epatch "${FILESDIR}"/0001-Add-R600-backend.patch
+	epatch "${FILESDIR}"/R600-Mesa-9.1.patch
 
 	# User patches
 	epatch_user
@@ -150,20 +151,28 @@ src_configure() {
 src_compile() {
 	emake VERBOSE=1 KEEP_SYMBOLS=1 REQUIRES_RTTI=1
 
-	emake -C docs -f Makefile.sphinx man
-	use doc && emake -C docs -f Makefile.sphinx html
+	if use doc; then
+		emake -C docs -f Makefile.sphinx man html
+	fi
+	#	emake -C docs -f Makefile.sphinx html
 
 	pax-mark m Release/bin/lli
 	if use test; then
 		pax-mark m unittests/ExecutionEngine/JIT/Release/JITTests
+		pax-mark m unittests/ExecutionEngine/MCJIT/Release/MCJITTests
+		pax-mark m unittests/Support/Release/SupportTests
 	fi
 }
 
 src_install() {
 	emake KEEP_SYMBOLS=1 DESTDIR="${D}" install
 
-	doman docs/_build/man/*.1
-	use doc && dohtml -r docs/_build/html/
+	if use doc; then
+		doman docs/_build/man/*.1
+		dohtml -r docs/_build/html/
+	else
+		doman "${WORKDIR}"/${P}-manpages/*.1
+	fi
 
 	if use vim-syntax; then
 		insinto /usr/share/vim/vimfiles/syntax

diff --git a/sys-devel/llvm/metadata.xml b/sys-devel/llvm/metadata.xml
index e5a362b..38e16d8 100644
--- a/sys-devel/llvm/metadata.xml
+++ b/sys-devel/llvm/metadata.xml
@@ -16,7 +16,6 @@
    4. LLVM does not imply things that you would expect from a high-level virtual machine. It does not require garbage collection or run-time code generation (In fact, LLVM makes a great static compiler!). Note that optional LLVM components can be used to build high-level virtual machines and other systems that need these services.</longdescription>
 	<use>
 		<flag name='gold'>Build the gold linker plugin</flag>
-		<flag name='llvm-gcc'>Build LLVM with <pkg>sys-devel/llvm-gcc</pkg></flag>
 		<flag name='multitarget'>Build all host targets (default: host only)</flag>
 		<flag name='udis86'>Enable support for <pkg>dev-libs/udis86</pkg> disassembler library</flag>
 	</use>


             reply	other threads:[~2013-03-05  5:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-05  5:38 Alexey Shvetsov [this message]
  -- strict thread matches above, loose matches on Subject: below --
2013-04-06 10:35 [gentoo-commits] proj/x11:opencl commit in: sys-devel/llvm/files/, sys-devel/llvm/ Alexey Shvetsov
2013-03-05  6:28 Alexey Shvetsov
2012-12-22 21:09 Alexey Shvetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1362461694.9597d3a8e121c0e0961de5642b1f550700dc8496.alexxy@gentoo \
    --to=alexxy@gentoo.org \
    --cc=gentoo-commits@lists.gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox