* [gentoo-user] hard drive encryption @ 2012-03-11 15:38 Valmor de Almeida 2012-03-11 18:29 ` Florian Philipp 0 siblings, 1 reply; 22+ messages in thread From: Valmor de Almeida @ 2012-03-11 15:38 UTC (permalink / raw To: gentoo-user Hello, I have not looked at encryption before and find myself in a situation that I have to encrypt my hard drive. I keep /, /boot, and swap outside LVM, everything else is under LVM. I think all I need to do is to encrypt /home which is under LVM. I use reiserfs. I would appreciate suggestion and pointers on what it is practical and simple in order to accomplish this task with a minimum of downtime. Thanks, -- Valmor ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-11 15:38 [gentoo-user] hard drive encryption Valmor de Almeida @ 2012-03-11 18:29 ` Florian Philipp 2012-03-13 11:55 ` Valmor de Almeida 0 siblings, 1 reply; 22+ messages in thread From: Florian Philipp @ 2012-03-11 18:29 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1444 bytes --] Am 11.03.2012 16:38, schrieb Valmor de Almeida: > > Hello, > > I have not looked at encryption before and find myself in a situation > that I have to encrypt my hard drive. I keep /, /boot, and swap outside > LVM, everything else is under LVM. I think all I need to do is to > encrypt /home which is under LVM. I use reiserfs. > > I would appreciate suggestion and pointers on what it is practical and > simple in order to accomplish this task with a minimum of downtime. > > Thanks, > > -- > Valmor > Is it acceptable for you to have a commandline prompt for the password when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt init script. /etc/conf.d/dmcrypt should contain some examples. As you want to encrypt an LVM volume, the lvm init script needs to be started before this. As I see it, there is no strict dependency between those two scripts. You can add this by adding this line to /etc/rc.conf: rc_dmcrypt_after="lvm" For creating a LUKS-encrypted volume, look at http://en.gentoo-wiki.com/wiki/DM-Crypt You won't need most of what is written there; just section 9, "Administering LUKS" and the kernel config in section 2, "Assumptions". Concerning downtime, I'm not aware of any solution that avoids copying the data over to the new volume. If downtime is absolutely critical, ask and we can work something out that minimizes the time. Regards, Florian Philipp [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-11 18:29 ` Florian Philipp @ 2012-03-13 11:55 ` Valmor de Almeida 2012-03-13 16:11 ` Florian Philipp 0 siblings, 1 reply; 22+ messages in thread From: Valmor de Almeida @ 2012-03-13 11:55 UTC (permalink / raw To: gentoo-user On 03/11/2012 02:29 PM, Florian Philipp wrote: > Am 11.03.2012 16:38, schrieb Valmor de Almeida: >> >> Hello, >> >> I have not looked at encryption before and find myself in a situation >> that I have to encrypt my hard drive. I keep /, /boot, and swap outside >> LVM, everything else is under LVM. I think all I need to do is to >> encrypt /home which is under LVM. I use reiserfs. >> >> I would appreciate suggestion and pointers on what it is practical and >> simple in order to accomplish this task with a minimum of downtime. >> >> Thanks, >> >> -- >> Valmor >> > > > Is it acceptable for you to have a commandline prompt for the password > when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt I think so. > init script. /etc/conf.d/dmcrypt should contain some examples. As you > want to encrypt an LVM volume, the lvm init script needs to be started > before this. As I see it, there is no strict dependency between those > two scripts. You can add this by adding this line to /etc/rc.conf: > rc_dmcrypt_after="lvm" > > For creating a LUKS-encrypted volume, look at > http://en.gentoo-wiki.com/wiki/DM-Crypt Currently looking at this. > > You won't need most of what is written there; just section 9, > "Administering LUKS" and the kernel config in section 2, "Assumptions". > > Concerning downtime, I'm not aware of any solution that avoids copying > the data over to the new volume. If downtime is absolutely critical, ask > and we can work something out that minimizes the time. > > Regards, > Florian Philipp > Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? Thanks, -- Valmor ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 11:55 ` Valmor de Almeida @ 2012-03-13 16:11 ` Florian Philipp 2012-03-13 16:26 ` Michael Mol 2012-03-13 17:45 ` Frank Steinmetzger 0 siblings, 2 replies; 22+ messages in thread From: Florian Philipp @ 2012-03-13 16:11 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 2249 bytes --] Am 13.03.2012 12:55, schrieb Valmor de Almeida: > On 03/11/2012 02:29 PM, Florian Philipp wrote: >> Am 11.03.2012 16:38, schrieb Valmor de Almeida: >>> >>> Hello, >>> >>> I have not looked at encryption before and find myself in a situation >>> that I have to encrypt my hard drive. I keep /, /boot, and swap outside >>> LVM, everything else is under LVM. I think all I need to do is to >>> encrypt /home which is under LVM. I use reiserfs. >>> >>> I would appreciate suggestion and pointers on what it is practical and >>> simple in order to accomplish this task with a minimum of downtime. >>> >>> Thanks, >>> >>> -- >>> Valmor >>> >> >> >> Is it acceptable for you to have a commandline prompt for the password >> when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt > > I think so. > >> init script. /etc/conf.d/dmcrypt should contain some examples. As you >> want to encrypt an LVM volume, the lvm init script needs to be started >> before this. As I see it, there is no strict dependency between those >> two scripts. You can add this by adding this line to /etc/rc.conf: >> rc_dmcrypt_after="lvm" >> >> For creating a LUKS-encrypted volume, look at >> http://en.gentoo-wiki.com/wiki/DM-Crypt > > Currently looking at this. > >> >> You won't need most of what is written there; just section 9, >> "Administering LUKS" and the kernel config in section 2, "Assumptions". >> >> Concerning downtime, I'm not aware of any solution that avoids copying >> the data over to the new volume. If downtime is absolutely critical, ask >> and we can work something out that minimizes the time. >> >> Regards, >> Florian Philipp >> > > Since I am planning to encrypt only home/ under LVM control, what kind > of overhead should I expect? > > Thanks, > What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Measured on a Core i5: time cat Video/*.* >/dev/null real 0m42.918s user 0m0.023s sys 0m2.027s That was a sequential read of roughly 3.5GB with empty caches. This corresponds to the normal disk speed. Regards, Florian Philipp [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 16:11 ` Florian Philipp @ 2012-03-13 16:26 ` Michael Mol 2012-03-13 16:49 ` Florian Philipp 2012-03-13 17:45 ` Frank Steinmetzger 1 sibling, 1 reply; 22+ messages in thread From: Michael Mol @ 2012-03-13 16:26 UTC (permalink / raw To: gentoo-user On Tue, Mar 13, 2012 at 12:11 PM, Florian Philipp <lists@binarywings.net> wrote: > Am 13.03.2012 12:55, schrieb Valmor de Almeida: >> On 03/11/2012 02:29 PM, Florian Philipp wrote: >>> Am 11.03.2012 16:38, schrieb Valmor de Almeida: >>>> >>>> Hello, >>>> >>>> I have not looked at encryption before and find myself in a situation >>>> that I have to encrypt my hard drive. I keep /, /boot, and swap outside >>>> LVM, everything else is under LVM. I think all I need to do is to >>>> encrypt /home which is under LVM. I use reiserfs. >>>> >>>> I would appreciate suggestion and pointers on what it is practical and >>>> simple in order to accomplish this task with a minimum of downtime. >>>> >>>> Thanks, >>>> >>>> -- >>>> Valmor >>>> >>> >>> >>> Is it acceptable for you to have a commandline prompt for the password >>> when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt >> >> I think so. >> >>> init script. /etc/conf.d/dmcrypt should contain some examples. As you >>> want to encrypt an LVM volume, the lvm init script needs to be started >>> before this. As I see it, there is no strict dependency between those >>> two scripts. You can add this by adding this line to /etc/rc.conf: >>> rc_dmcrypt_after="lvm" >>> >>> For creating a LUKS-encrypted volume, look at >>> http://en.gentoo-wiki.com/wiki/DM-Crypt >> >> Currently looking at this. >> >>> >>> You won't need most of what is written there; just section 9, >>> "Administering LUKS" and the kernel config in section 2, "Assumptions". >>> >>> Concerning downtime, I'm not aware of any solution that avoids copying >>> the data over to the new volume. If downtime is absolutely critical, ask >>> and we can work something out that minimizes the time. >>> >>> Regards, >>> Florian Philipp >>> >> >> Since I am planning to encrypt only home/ under LVM control, what kind >> of overhead should I expect? >> >> Thanks, >> > > What do you mean with overhead? CPU utilization? In that case the > overhead is minimal, especially when you run a 64-bit kernel with the > optimized AES kernel module. Rough guess: Latency. With encryption, you can't DMA disk data directly into a process's address space, because you need the decrypt hop. Try running bonnie++ on encrypted vs non-encrypted volumes. (Or not; I doubt you have the time and materials to do a good, meaningful set of time trials) -- :wq ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 16:26 ` Michael Mol @ 2012-03-13 16:49 ` Florian Philipp 2012-03-13 16:54 ` Neil Bothwick 2012-03-13 16:54 ` Michael Mol 0 siblings, 2 replies; 22+ messages in thread From: Florian Philipp @ 2012-03-13 16:49 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 2777 bytes --] Am 13.03.2012 17:26, schrieb Michael Mol: > On Tue, Mar 13, 2012 at 12:11 PM, Florian Philipp <lists@binarywings.net> wrote: >> Am 13.03.2012 12:55, schrieb Valmor de Almeida: >>> On 03/11/2012 02:29 PM, Florian Philipp wrote: >>>> Am 11.03.2012 16:38, schrieb Valmor de Almeida: >>>>> >>>>> Hello, >>>>> >>>>> I have not looked at encryption before and find myself in a situation >>>>> that I have to encrypt my hard drive. I keep /, /boot, and swap outside >>>>> LVM, everything else is under LVM. I think all I need to do is to >>>>> encrypt /home which is under LVM. I use reiserfs. >>>>> >>>>> I would appreciate suggestion and pointers on what it is practical and >>>>> simple in order to accomplish this task with a minimum of downtime. >>>>> >>>>> Thanks, >>>>> >>>>> -- >>>>> Valmor >>>>> >>>> >>>> >>>> Is it acceptable for you to have a commandline prompt for the password >>>> when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt >>> >>> I think so. >>> >>>> init script. /etc/conf.d/dmcrypt should contain some examples. As you >>>> want to encrypt an LVM volume, the lvm init script needs to be started >>>> before this. As I see it, there is no strict dependency between those >>>> two scripts. You can add this by adding this line to /etc/rc.conf: >>>> rc_dmcrypt_after="lvm" >>>> >>>> For creating a LUKS-encrypted volume, look at >>>> http://en.gentoo-wiki.com/wiki/DM-Crypt >>> >>> Currently looking at this. >>> >>>> >>>> You won't need most of what is written there; just section 9, >>>> "Administering LUKS" and the kernel config in section 2, "Assumptions". >>>> >>>> Concerning downtime, I'm not aware of any solution that avoids copying >>>> the data over to the new volume. If downtime is absolutely critical, ask >>>> and we can work something out that minimizes the time. >>>> >>>> Regards, >>>> Florian Philipp >>>> >>> >>> Since I am planning to encrypt only home/ under LVM control, what kind >>> of overhead should I expect? >>> >>> Thanks, >>> >> >> What do you mean with overhead? CPU utilization? In that case the >> overhead is minimal, especially when you run a 64-bit kernel with the >> optimized AES kernel module. > > Rough guess: Latency. With encryption, you can't DMA disk data > directly into a process's address space, because you need the decrypt > hop. > Good call. Wouldn't have thought of that. > Try running bonnie++ on encrypted vs non-encrypted volumes. (Or not; I > doubt you have the time and materials to do a good, meaningful set of > time trials) > Yeah, that sounds like something for which you need a very dull winter day. Besides, I've already lost a poorly cooled HDD on a benchmark. Regards, Florian Philipp [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 16:49 ` Florian Philipp @ 2012-03-13 16:54 ` Neil Bothwick 2012-03-13 16:54 ` Michael Mol 1 sibling, 0 replies; 22+ messages in thread From: Neil Bothwick @ 2012-03-13 16:54 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 246 bytes --] On Tue, 13 Mar 2012 17:49:40 +0100, Florian Philipp wrote: > Besides, I've already lost a poorly cooled HDD on a benchmark. Better than losing it on real data. -- Neil Bothwick Why do they call it a TV set when you only get one? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 16:49 ` Florian Philipp 2012-03-13 16:54 ` Neil Bothwick @ 2012-03-13 16:54 ` Michael Mol 1 sibling, 0 replies; 22+ messages in thread From: Michael Mol @ 2012-03-13 16:54 UTC (permalink / raw To: gentoo-user On Tue, Mar 13, 2012 at 12:49 PM, Florian Philipp <lists@binarywings.net> wrote: > Am 13.03.2012 17:26, schrieb Michael Mol: >> On Tue, Mar 13, 2012 at 12:11 PM, Florian Philipp <lists@binarywings.net> wrote: >>> Am 13.03.2012 12:55, schrieb Valmor de Almeida: >>>> On 03/11/2012 02:29 PM, Florian Philipp wrote: >>>>> Am 11.03.2012 16:38, schrieb Valmor de Almeida: >>>>>> >>>>>> Hello, >>>>>> >>>>>> I have not looked at encryption before and find myself in a situation >>>>>> that I have to encrypt my hard drive. I keep /, /boot, and swap outside >>>>>> LVM, everything else is under LVM. I think all I need to do is to >>>>>> encrypt /home which is under LVM. I use reiserfs. >>>>>> >>>>>> I would appreciate suggestion and pointers on what it is practical and >>>>>> simple in order to accomplish this task with a minimum of downtime. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -- >>>>>> Valmor >>>>>> >>>>> >>>>> >>>>> Is it acceptable for you to have a commandline prompt for the password >>>>> when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt >>>> >>>> I think so. >>>> >>>>> init script. /etc/conf.d/dmcrypt should contain some examples. As you >>>>> want to encrypt an LVM volume, the lvm init script needs to be started >>>>> before this. As I see it, there is no strict dependency between those >>>>> two scripts. You can add this by adding this line to /etc/rc.conf: >>>>> rc_dmcrypt_after="lvm" >>>>> >>>>> For creating a LUKS-encrypted volume, look at >>>>> http://en.gentoo-wiki.com/wiki/DM-Crypt >>>> >>>> Currently looking at this. >>>> >>>>> >>>>> You won't need most of what is written there; just section 9, >>>>> "Administering LUKS" and the kernel config in section 2, "Assumptions". >>>>> >>>>> Concerning downtime, I'm not aware of any solution that avoids copying >>>>> the data over to the new volume. If downtime is absolutely critical, ask >>>>> and we can work something out that minimizes the time. >>>>> >>>>> Regards, >>>>> Florian Philipp >>>>> >>>> >>>> Since I am planning to encrypt only home/ under LVM control, what kind >>>> of overhead should I expect? >>>> >>>> Thanks, >>>> >>> >>> What do you mean with overhead? CPU utilization? In that case the >>> overhead is minimal, especially when you run a 64-bit kernel with the >>> optimized AES kernel module. >> >> Rough guess: Latency. With encryption, you can't DMA disk data >> directly into a process's address space, because you need the decrypt >> hop. >> > > Good call. Wouldn't have thought of that. > >> Try running bonnie++ on encrypted vs non-encrypted volumes. (Or not; I >> doubt you have the time and materials to do a good, meaningful set of >> time trials) >> > > Yeah, that sounds like something for which you need a very dull winter > day. Besides, I've already lost a poorly cooled HDD on a benchmark. Sounds like something we can do at my LUG at one of our weekly socials. The part I don't know is how to set this kind of thing up and how to tune it; I don't want it to be like Microsoft's comparison of SQL Server against MySQL from a decade or so ago, where they didn't tune MySQL for their bench workload. -- :wq ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 16:11 ` Florian Philipp 2012-03-13 16:26 ` Michael Mol @ 2012-03-13 17:45 ` Frank Steinmetzger 2012-03-13 18:06 ` Florian Philipp 1 sibling, 1 reply; 22+ messages in thread From: Frank Steinmetzger @ 2012-03-13 17:45 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 741 bytes --] On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: > > Since I am planning to encrypt only home/ under LVM control, what kind > > of overhead should I expect? > > What do you mean with overhead? CPU utilization? In that case the > overhead is minimal, especially when you run a 64-bit kernel with the > optimized AES kernel module. Speaking of that... I always wondered what the exact difference was between AES and AES i586. I can gather myself that it's about optimisation for a specific architecture. But which one would be best for my i686 Core 2 Duo? -- Gruß | Greetings | Qapla' I forbid any use of my email addresses with Facebook services. A pessimist is an optimist who's given it some thought. [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 17:45 ` Frank Steinmetzger @ 2012-03-13 18:06 ` Florian Philipp 2012-03-13 18:18 ` Michael Mol 0 siblings, 1 reply; 22+ messages in thread From: Florian Philipp @ 2012-03-13 18:06 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 953 bytes --] Am 13.03.2012 18:45, schrieb Frank Steinmetzger: > On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: > >>> Since I am planning to encrypt only home/ under LVM control, what kind >>> of overhead should I expect? >> >> What do you mean with overhead? CPU utilization? In that case the >> overhead is minimal, especially when you run a 64-bit kernel with the >> optimized AES kernel module. > > Speaking of that... > I always wondered what the exact difference was between AES and AES i586. I > can gather myself that it's about optimisation for a specific architecture. > But which one would be best for my i686 Core 2 Duo? From what I can see in the kernel sources, there is a generic AES implementation using nothing but portable C code and then there is "aes-i586" assembler code with "aes_glue" C code. So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 18:06 ` Florian Philipp @ 2012-03-13 18:18 ` Michael Mol 2012-03-13 18:58 ` Florian Philipp 2012-03-13 19:07 ` Stroller 0 siblings, 2 replies; 22+ messages in thread From: Michael Mol @ 2012-03-13 18:18 UTC (permalink / raw To: gentoo-user On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp <lists@binarywings.net> wrote: > Am 13.03.2012 18:45, schrieb Frank Steinmetzger: >> On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: >> >>>> Since I am planning to encrypt only home/ under LVM control, what kind >>>> of overhead should I expect? >>> >>> What do you mean with overhead? CPU utilization? In that case the >>> overhead is minimal, especially when you run a 64-bit kernel with the >>> optimized AES kernel module. >> >> Speaking of that... >> I always wondered what the exact difference was between AES and AES i586. I >> can gather myself that it's about optimisation for a specific architecture. >> But which one would be best for my i686 Core 2 Duo? > > From what I can see in the kernel sources, there is a generic AES > implementation using nothing but portable C code and then there is > "aes-i586" assembler code with "aes_glue" C code. > So I assume the i586 > version is better for you --- unless GCC suddenly got a lot better at > optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. -- :wq ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 18:18 ` Michael Mol @ 2012-03-13 18:58 ` Florian Philipp 2012-03-13 19:13 ` Michael Mol ` (2 more replies) 2012-03-13 19:07 ` Stroller 1 sibling, 3 replies; 22+ messages in thread From: Florian Philipp @ 2012-03-13 18:58 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 2557 bytes --] Am 13.03.2012 19:18, schrieb Michael Mol: > On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp <lists@binarywings.net> wrote: >> Am 13.03.2012 18:45, schrieb Frank Steinmetzger: >>> On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: >>> >>>>> Since I am planning to encrypt only home/ under LVM control, what kind >>>>> of overhead should I expect? >>>> >>>> What do you mean with overhead? CPU utilization? In that case the >>>> overhead is minimal, especially when you run a 64-bit kernel with the >>>> optimized AES kernel module. >>> >>> Speaking of that... >>> I always wondered what the exact difference was between AES and AES i586. I >>> can gather myself that it's about optimisation for a specific architecture. >>> But which one would be best for my i686 Core 2 Duo? >> >> From what I can see in the kernel sources, there is a generic AES >> implementation using nothing but portable C code and then there is >> "aes-i586" assembler code with "aes_glue" C code. > > >> So I assume the i586 >> version is better for you --- unless GCC suddenly got a lot better at >> optimizing code. > > Since when, exactly? GCC isn't the best compiler at optimization, but > I fully expect current versions to produce better code for x86-64 than > hand-tuned i586. Wider registers, more registers, crypto acceleration > instructions and SIMD instructions are all very nice to have. I don't > know the specifics of AES, though, or what kind of crypto algorithm it > is, so it's entirely possible that one can't effectively parallelize > it except in some relatively unique circumstances. > One sec. We are talking about an Core2 Duo running in 32bit mode, right? That's what the i686 reference in the question meant --- or at least, that's what I assumed. If we talk about 32bit mode, none of what you describe is available. Those additional registers and instructions are not accessible with i686 instructions. A Core 2 also has no AES instructions. Of course, GCC could make use of what it knows about the CPU, like number of parallel pipelines, pipeline depth, cache size, instructions added in i686 and so on. But even then I doubt it can outperform hand-tuned assembler, even if it is for a slightly older instruction set. If instead we are talking about an Core 2 Duo running in x86_64 mode, we should be talking about the aes-x86_64 module instead of the aes-i586 module and that makes use of the complete instruction set of the Core 2, including SSE2. Regards, Florian Philipp [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 18:58 ` Florian Philipp @ 2012-03-13 19:13 ` Michael Mol 2012-03-13 19:30 ` Florian Philipp 2012-03-13 19:18 ` Florian Philipp 2012-03-13 21:05 ` Frank Steinmetzger 2 siblings, 1 reply; 22+ messages in thread From: Michael Mol @ 2012-03-13 19:13 UTC (permalink / raw To: gentoo-user On Tue, Mar 13, 2012 at 2:58 PM, Florian Philipp <lists@binarywings.net> wrote: > Am 13.03.2012 19:18, schrieb Michael Mol: >> On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp <lists@binarywings.net> wrote: >>> Am 13.03.2012 18:45, schrieb Frank Steinmetzger: >>>> On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: >>>> >>>>>> Since I am planning to encrypt only home/ under LVM control, what kind >>>>>> of overhead should I expect? >>>>> >>>>> What do you mean with overhead? CPU utilization? In that case the >>>>> overhead is minimal, especially when you run a 64-bit kernel with the >>>>> optimized AES kernel module. >>>> >>>> Speaking of that... >>>> I always wondered what the exact difference was between AES and AES i586. I >>>> can gather myself that it's about optimisation for a specific architecture. >>>> But which one would be best for my i686 Core 2 Duo? >>> >>> From what I can see in the kernel sources, there is a generic AES >>> implementation using nothing but portable C code and then there is >>> "aes-i586" assembler code with "aes_glue" C code. >> >> >>> So I assume the i586 >>> version is better for you --- unless GCC suddenly got a lot better at >>> optimizing code. >> >> Since when, exactly? GCC isn't the best compiler at optimization, but >> I fully expect current versions to produce better code for x86-64 than >> hand-tuned i586. Wider registers, more registers, crypto acceleration >> instructions and SIMD instructions are all very nice to have. I don't >> know the specifics of AES, though, or what kind of crypto algorithm it >> is, so it's entirely possible that one can't effectively parallelize >> it except in some relatively unique circumstances. >> > > One sec. We are talking about an Core2 Duo running in 32bit mode, right? > That's what the i686 reference in the question meant --- or at least, > that's what I assumed. I think you're right; I missed that part. > > If we talk about 32bit mode, none of what you describe is available. > Those additional registers and instructions are not accessible with i686 > instructions. A Core 2 also has no AES instructions. > > Of course, GCC could make use of what it knows about the CPU, like > number of parallel pipelines, pipeline depth, cache size, instructions > added in i686 and so on. But even then I doubt it can outperform > hand-tuned assembler, even if it is for a slightly older instruction set. I'm still not sure why. I'll posit that some badly-written C could place constraints on the compiler's optimizer, but GCC should have little problem handling well-written C, separating semantics from syntax and finding good transforms of the original code to get proofably-same results. Unless I'm grossly overestimating the capabilities of its AST processing and optimization engine. > > If instead we are talking about an Core 2 Duo running in x86_64 mode, we > should be talking about the aes-x86_64 module instead of the aes-i586 > module and that makes use of the complete instruction set of the Core 2, > including SSE2. FWIW, SSE2 is available on 32-bit processors; I have code in the field using SSE2 on Pentium 4s. -- :wq ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 19:13 ` Michael Mol @ 2012-03-13 19:30 ` Florian Philipp 2012-03-13 19:42 ` Michael Mol 0 siblings, 1 reply; 22+ messages in thread From: Florian Philipp @ 2012-03-13 19:30 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 3908 bytes --] Am 13.03.2012 20:13, schrieb Michael Mol: > On Tue, Mar 13, 2012 at 2:58 PM, Florian Philipp <lists@binarywings.net> wrote: >> Am 13.03.2012 19:18, schrieb Michael Mol: >>> On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp <lists@binarywings.net> wrote: >>>> Am 13.03.2012 18:45, schrieb Frank Steinmetzger: >>>>> On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: >>>>> >>>>>>> Since I am planning to encrypt only home/ under LVM control, what kind >>>>>>> of overhead should I expect? >>>>>> >>>>>> What do you mean with overhead? CPU utilization? In that case the >>>>>> overhead is minimal, especially when you run a 64-bit kernel with the >>>>>> optimized AES kernel module. >>>>> >>>>> Speaking of that... >>>>> I always wondered what the exact difference was between AES and AES i586. I >>>>> can gather myself that it's about optimisation for a specific architecture. >>>>> But which one would be best for my i686 Core 2 Duo? >>>> >>>> From what I can see in the kernel sources, there is a generic AES >>>> implementation using nothing but portable C code and then there is >>>> "aes-i586" assembler code with "aes_glue" C code. >>> >>> >>>> So I assume the i586 >>>> version is better for you --- unless GCC suddenly got a lot better at >>>> optimizing code. >>> >>> Since when, exactly? GCC isn't the best compiler at optimization, but >>> I fully expect current versions to produce better code for x86-64 than >>> hand-tuned i586. Wider registers, more registers, crypto acceleration >>> instructions and SIMD instructions are all very nice to have. I don't >>> know the specifics of AES, though, or what kind of crypto algorithm it >>> is, so it's entirely possible that one can't effectively parallelize >>> it except in some relatively unique circumstances. >>> >> >> One sec. We are talking about an Core2 Duo running in 32bit mode, right? >> That's what the i686 reference in the question meant --- or at least, >> that's what I assumed. > > I think you're right; I missed that part. > >> >> If we talk about 32bit mode, none of what you describe is available. >> Those additional registers and instructions are not accessible with i686 >> instructions. A Core 2 also has no AES instructions. >> >> Of course, GCC could make use of what it knows about the CPU, like >> number of parallel pipelines, pipeline depth, cache size, instructions >> added in i686 and so on. But even then I doubt it can outperform >> hand-tuned assembler, even if it is for a slightly older instruction set. > > I'm still not sure why. I'll posit that some badly-written C could > place constraints on the compiler's optimizer, but GCC should have > little problem handling well-written C, separating semantics from > syntax and finding good transforms of the original code to get > proofably-same results. Unless I'm grossly overestimating the > capabilities of its AST processing and optimization engine. > Well, it's not /that/ good. Otherwise the Firefox ebuild wouldn't need a profiling run to allow the compiler to predict loop and jump certainties and so on. But, by all means, let's test it! It's not like we cannot. Unfortunately, I don't have a 32bit Gentoo machine at hand where I could test it right now. >> >> If instead we are talking about an Core 2 Duo running in x86_64 mode, we >> should be talking about the aes-x86_64 module instead of the aes-i586 >> module and that makes use of the complete instruction set of the Core 2, >> including SSE2. > > FWIW, SSE2 is available on 32-bit processors; I have code in the field > using SSE2 on Pentium 4s. > Um, yeah. I should have clarified that. I meant that for x86_64 machines, the compiler as well as the assembler programmer can safely assume that SSE2 is available. For generic i686 assembler code, you cannot. Regards, Florian Philipp [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 19:30 ` Florian Philipp @ 2012-03-13 19:42 ` Michael Mol 0 siblings, 0 replies; 22+ messages in thread From: Michael Mol @ 2012-03-13 19:42 UTC (permalink / raw To: gentoo-user On Tue, Mar 13, 2012 at 3:30 PM, Florian Philipp <lists@binarywings.net> wrote: > Am 13.03.2012 20:13, schrieb Michael Mol: >> On Tue, Mar 13, 2012 at 2:58 PM, Florian Philipp <lists@binarywings.net> wrote: >>> Am 13.03.2012 19:18, schrieb Michael Mol: >>>> On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp <lists@binarywings.net> wrote: >>>>> Am 13.03.2012 18:45, schrieb Frank Steinmetzger: >>>>>> On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: >>>>>> >>>>>>>> Since I am planning to encrypt only home/ under LVM control, what kind >>>>>>>> of overhead should I expect? >>>>>>> >>>>>>> What do you mean with overhead? CPU utilization? In that case the >>>>>>> overhead is minimal, especially when you run a 64-bit kernel with the >>>>>>> optimized AES kernel module. >>>>>> >>>>>> Speaking of that... >>>>>> I always wondered what the exact difference was between AES and AES i586. I >>>>>> can gather myself that it's about optimisation for a specific architecture. >>>>>> But which one would be best for my i686 Core 2 Duo? >>>>> >>>>> From what I can see in the kernel sources, there is a generic AES >>>>> implementation using nothing but portable C code and then there is >>>>> "aes-i586" assembler code with "aes_glue" C code. >>>> >>>> >>>>> So I assume the i586 >>>>> version is better for you --- unless GCC suddenly got a lot better at >>>>> optimizing code. >>>> >>>> Since when, exactly? GCC isn't the best compiler at optimization, but >>>> I fully expect current versions to produce better code for x86-64 than >>>> hand-tuned i586. Wider registers, more registers, crypto acceleration >>>> instructions and SIMD instructions are all very nice to have. I don't >>>> know the specifics of AES, though, or what kind of crypto algorithm it >>>> is, so it's entirely possible that one can't effectively parallelize >>>> it except in some relatively unique circumstances. >>>> >>> >>> One sec. We are talking about an Core2 Duo running in 32bit mode, right? >>> That's what the i686 reference in the question meant --- or at least, >>> that's what I assumed. >> >> I think you're right; I missed that part. >> >>> >>> If we talk about 32bit mode, none of what you describe is available. >>> Those additional registers and instructions are not accessible with i686 >>> instructions. A Core 2 also has no AES instructions. >>> >>> Of course, GCC could make use of what it knows about the CPU, like >>> number of parallel pipelines, pipeline depth, cache size, instructions >>> added in i686 and so on. But even then I doubt it can outperform >>> hand-tuned assembler, even if it is for a slightly older instruction set. >> >> I'm still not sure why. I'll posit that some badly-written C could >> place constraints on the compiler's optimizer, but GCC should have >> little problem handling well-written C, separating semantics from >> syntax and finding good transforms of the original code to get >> proofably-same results. Unless I'm grossly overestimating the >> capabilities of its AST processing and optimization engine. >> > > Well, it's not /that/ good. Otherwise the Firefox ebuild wouldn't need a > profiling run to allow the compiler to predict loop and jump certainties > and so on. I was thinking more in the context of simple functions and mathematical operations. Loop probabilities? Yeah, that's a tough one. Nobody wants to stall a huge CPU pipeline. I remember when the NetBurst architecture came out. Intel cranked up the amount of die space dedicated to branch prediction... > > But, by all means, let's test it! It's not like we cannot. > Unfortunately, I don't have a 32bit Gentoo machine at hand where I could > test it right now. Now we're talking. :) Unfortunately, I don't have a 32-bit Gentoo environment available, either. Actually, I've never run Gentoo in a 32-bit envrionment. >.> -- :wq ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 18:58 ` Florian Philipp 2012-03-13 19:13 ` Michael Mol @ 2012-03-13 19:18 ` Florian Philipp 2012-03-13 21:05 ` Frank Steinmetzger 2 siblings, 0 replies; 22+ messages in thread From: Florian Philipp @ 2012-03-13 19:18 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 3133 bytes --] Am 13.03.2012 19:58, schrieb Florian Philipp: > Am 13.03.2012 19:18, schrieb Michael Mol: >> On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp <lists@binarywings.net> wrote: >>> Am 13.03.2012 18:45, schrieb Frank Steinmetzger: >>>> On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: >>>> >>>>>> Since I am planning to encrypt only home/ under LVM control, what kind >>>>>> of overhead should I expect? >>>>> >>>>> What do you mean with overhead? CPU utilization? In that case the >>>>> overhead is minimal, especially when you run a 64-bit kernel with the >>>>> optimized AES kernel module. >>>> >>>> Speaking of that... >>>> I always wondered what the exact difference was between AES and AES i586. I >>>> can gather myself that it's about optimisation for a specific architecture. >>>> But which one would be best for my i686 Core 2 Duo? >>> >>> From what I can see in the kernel sources, there is a generic AES >>> implementation using nothing but portable C code and then there is >>> "aes-i586" assembler code with "aes_glue" C code. >> >> >>> So I assume the i586 >>> version is better for you --- unless GCC suddenly got a lot better at >>> optimizing code. >> >> Since when, exactly? GCC isn't the best compiler at optimization, but >> I fully expect current versions to produce better code for x86-64 than >> hand-tuned i586. Wider registers, more registers, crypto acceleration >> instructions and SIMD instructions are all very nice to have. I don't >> know the specifics of AES, though, or what kind of crypto algorithm it >> is, so it's entirely possible that one can't effectively parallelize >> it except in some relatively unique circumstances. >> > > One sec. We are talking about an Core2 Duo running in 32bit mode, right? > That's what the i686 reference in the question meant --- or at least, > that's what I assumed. > > If we talk about 32bit mode, none of what you describe is available. > Those additional registers and instructions are not accessible with i686 > instructions. A Core 2 also has no AES instructions. > > Of course, GCC could make use of what it knows about the CPU, like > number of parallel pipelines, pipeline depth, cache size, instructions > added in i686 and so on. But even then I doubt it can outperform > hand-tuned assembler, even if it is for a slightly older instruction set. > P.S: I just looked up the differences in the instruction sets of i586 and i686. The only significant instruction added in i686 was a conditional move (CMOV). This helps to avoid condition jumps. However, in the aes-i586 code there are only two conditional jumps and they both just end the loop of encryption/decryption rounds for AES-128 and AES256, respectively. My assembler isn't perfect but I doubt you can optimize that away with a CMOV. > If instead we are talking about an Core 2 Duo running in x86_64 mode, we > should be talking about the aes-x86_64 module instead of the aes-i586 > module and that makes use of the complete instruction set of the Core 2, > including SSE2. > > Regards, > Florian Philipp [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 18:58 ` Florian Philipp 2012-03-13 19:13 ` Michael Mol 2012-03-13 19:18 ` Florian Philipp @ 2012-03-13 21:05 ` Frank Steinmetzger 2 siblings, 0 replies; 22+ messages in thread From: Frank Steinmetzger @ 2012-03-13 21:05 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1657 bytes --] On Tue, Mar 13, 2012 at 07:58:55PM +0100, Florian Philipp wrote: > >> From what I can see in the kernel sources, there is a generic AES > >> implementation using nothing but portable C code and then there is > >> "aes-i586" assembler code with "aes_glue" C code. > > > >> So I assume the i586 > >> version is better for you --- unless GCC suddenly got a lot better at > >> optimizing code. > > > > Since when, exactly? GCC isn't the best compiler at optimization, but > > I fully expect current versions to produce better code for x86-64 than > > hand-tuned i586. Wider registers, more registers, crypto acceleration > > instructions and SIMD instructions are all very nice to have. I don't > > know the specifics of AES, though, or what kind of crypto algorithm it > > is, so it's entirely possible that one can't effectively parallelize > > it except in some relatively unique circumstances. > > > > One sec. We are talking about an Core2 Duo running in 32bit mode, right? > That's what the i686 reference in the question meant --- or at least, > that's what I assumed. Sorry, I forgot to mention that I'm running 32 bit, yes. I don't really see the benefit of 64 bit for my use case. For all I know, the executables get bigger and my poor old laptop will have to shuffle more bits around. :) However, hardware AES would be *the* reason for me to, instead of a netbook, buy something with an i5 in my next laptop, some time in the distant future. -- Gruß | Greetings | Qapla' I forbid any use of my email addresses with Facebook services. Ein Computer stürzt nur ab, wenn der Text lange nicht gespeichert wurde. [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 18:18 ` Michael Mol 2012-03-13 18:58 ` Florian Philipp @ 2012-03-13 19:07 ` Stroller 2012-03-13 19:38 ` Michael Mol 2012-03-13 20:02 ` Florian Philipp 1 sibling, 2 replies; 22+ messages in thread From: Stroller @ 2012-03-13 19:07 UTC (permalink / raw To: gentoo-user On 13 March 2012, at 18:18, Michael Mol wrote: > ... >> So I assume the i586 >> version is better for you --- unless GCC suddenly got a lot better at >> optimizing code. > > Since when, exactly? GCC isn't the best compiler at optimization, but > I fully expect current versions to produce better code for x86-64 than > hand-tuned i586. Wider registers, more registers, crypto acceleration > instructions and SIMD instructions are all very nice to have. I don't > know the specifics of AES, though, or what kind of crypto algorithm it > is, so it's entirely possible that one can't effectively parallelize > it except in some relatively unique circumstances. Do you have much experience of writing assembler? I don't, and I'm not an expert on this, but I've read the odd blog article on this subject over the years. What I've read often has the programmer looking at the compiled gcc bytecode and examining what it does. The compiler might not care how many registers it uses, and thus a variable might find itself frequently swapped back into RAM; the programmer does not have any control over the compiler, and IIRC some flags reserve a register for degugging (IIRC -fomit-frame-pointer disables this). I think it's possible to use registers more efficiently by swapping them (??) or by using bitwise comparisons and other tricks. Assembler optimisation is only used on sections of code that are at the core of a loop - that are called hundreds or thousands (even millions?) of times during the program's execution. It's not for code, such as reading the .config file or initialisation, which is only called once. Because the code in the core of the loop is called so often, you don't have to achieve much of an optimisation for the aggregate to be much more considerable. The operations in question may only be constitute a few lines of C, or a handful of machine operations, so it boils down to an algorithm that a human programmer is capable of getting a grip on and comprehending. Whilst compilers are clearly more efficient for large programs, on this micro scale, humans are more clever and creative than machines. Encryption / decryption is an example of code that lends itself to this kind of optimisation. In particular AES was designed, I believe, to be amenable to implementation in this way. The reason for that was that it was desirable to have it run on embedded devices and on dedicated chips. So it boils down to a simple bitswap operation (??) - the plaintext is modified by the encryption key, input and output as a fast stream. Each byte goes in, each byte goes out, the same function performed on each one. Another operation that lends itself to assembler optimisation is video decoding - the video is encoded only once, and then may be played back hundreds or millions of times by different people. The same operations must be repeated a number of times on each frame, then c 25 - 60 frames are decoded per second, so at least 90,000 frames per hour. Again, the smallest optimisation is worthwhile. Stroller. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 19:07 ` Stroller @ 2012-03-13 19:38 ` Michael Mol 2012-03-13 20:15 ` Florian Philipp 2012-03-13 20:02 ` Florian Philipp 1 sibling, 1 reply; 22+ messages in thread From: Michael Mol @ 2012-03-13 19:38 UTC (permalink / raw To: gentoo-user On Tue, Mar 13, 2012 at 3:07 PM, Stroller <stroller@stellar.eclipse.co.uk> wrote: > > On 13 March 2012, at 18:18, Michael Mol wrote: >> ... >>> So I assume the i586 >>> version is better for you --- unless GCC suddenly got a lot better at >>> optimizing code. >> >> Since when, exactly? GCC isn't the best compiler at optimization, but >> I fully expect current versions to produce better code for x86-64 than >> hand-tuned i586. Wider registers, more registers, crypto acceleration >> instructions and SIMD instructions are all very nice to have. I don't >> know the specifics of AES, though, or what kind of crypto algorithm it >> is, so it's entirely possible that one can't effectively parallelize >> it except in some relatively unique circumstances. > > Do you have much experience of writing assembler? > > I don't, and I'm not an expert on this, but I've read the odd blog article on this subject over the years. Similar level of experience here. I can read it, even debug it from time to time. A few regular bloggers on the subject are like candy. And I used to have pagetable.org, Ars's Technopaedia and specsheets for early x86 and motorola processors memorized. For the past couple years, I've been focusing on reading blogs of language and compiler authors, academics involved in proofing, testing and improving them, etc. > > What I've read often has the programmer looking at the compiled gcc bytecode and examining what it does. The compiler might not care how many registers it uses, and thus a variable might find itself frequently swapped back into RAM; the programmer does not have any control over the compiler, and IIRC some flags reserve a register for degugging (IIRC -fomit-frame-pointer disables this). I think it's possible to use registers more efficiently by swapping them (??) or by using bitwise comparisons and other tricks. Sure; it's cheaper to null out a register by XORing it with itself than setting it to 0. > > Assembler optimisation is only used on sections of code that are at the core of a loop - that are called hundreds or thousands (even millions?) of times during the program's execution. It's not for code, such as reading the .config file or initialisation, which is only called once. Because the code in the core of the loop is called so often, you don't have to achieve much of an optimisation for the aggregate to be much more considerable. Sure; optimize the hell out of the code where you spend most of your time. I wasn't aware that gcc passed up on safe optimization opportunities, though. > > The operations in question may only be constitute a few lines of C, or a handful of machine operations, so it boils down to an algorithm that a human programmer is capable of getting a grip on and comprehending. Whilst compilers are clearly more efficient for large programs, on this micro scale, humans are more clever and creative than machines. I disagree. With defined semantics for the source and target, a computer's cleverness is limited only by the computational and memory expense of its search algorithms. Humans get through this by making habit various optimizations, but those habits become less useful as additional paths and instructions are added. As system complexity increases, humans operate on personally cached techniques derived from simpler systems. I would expect very, very few people to be intimately familiar with the the majority of optimization possibilities present on an amdfam10 processor or a core2. Compiler's aren't necessarily familiar with them, either; they're just quicker at discovering them, given knowledge of the individual instructions and the rules of language semantics. > > Encryption / decryption is an example of code that lends itself to this kind of optimisation. In particular AES was designed, I believe, to be amenable to implementation in this way. The reason for that was that it was desirable to have it run on embedded devices and on dedicated chips. So it boils down to a simple bitswap operation (??) - the plaintext is modified by the encryption key, input and output as a fast stream. Each byte goes in, each byte goes out, the same function performed on each one. I'd be willing to posit that you're right here, though if there isn't a per-byte feedback mechanism, SIMD instructions would come into serious play. But I expect there's a per-byte feedback mechanism, so parallelization would likely come in the form of processing simultaneous streams. > > Another operation that lends itself to assembler optimisation is video decoding - the video is encoded only once, and then may be played back hundreds or millions of times by different people. The same operations must be repeated a number of times on each frame, then c 25 - 60 frames are decoded per second, so at least 90,000 frames per hour. Again, the smallest optimisation is worthwhile. Absolutely. My position, though, is that compilers are quicker and more capable of discovering optimization possibilities than humans are, when the target architecture changes. Sure, you've got several dozen video codecs in, say, ffmpeg, and perhaps it all boils down to less than a dozen very common cases of inner loop code. With hand-tuned optimization, you'd need to fork your assembly patch for each new processor feature that comes out, and then work to find the most efficient way to execute code on that processor. There's also cases where processor features get changed. I don't remember the name of the instruction (it had something to do with stack operations) in x86, but Intel switched it from a 0-cycle instruction to something more expensive. Any code which assumed that instruction was a 0-cycle instruction now became less efficient. A compiler (presuming it has a knowledge of the target processor's instruction set properties) would have an easier time coping with that change than a human would. I'm not saying humans are useless; this is just one of those areas which is sufficiently complex-yet-deterministic that sufficient knowledge of the source and target environments would give a computer the edge over a human in finding the optimal sequence of CPU instructions. -- :wq ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 19:38 ` Michael Mol @ 2012-03-13 20:15 ` Florian Philipp 2012-03-13 20:22 ` Florian Philipp 0 siblings, 1 reply; 22+ messages in thread From: Florian Philipp @ 2012-03-13 20:15 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 7255 bytes --] Am 13.03.2012 20:38, schrieb Michael Mol: > On Tue, Mar 13, 2012 at 3:07 PM, Stroller > <stroller@stellar.eclipse.co.uk> wrote: >> >> On 13 March 2012, at 18:18, Michael Mol wrote: >>> ... >>>> So I assume the i586 version is better for you --- unless GCC >>>> suddenly got a lot better at optimizing code. >>> >>> Since when, exactly? GCC isn't the best compiler at optimization, >>> but I fully expect current versions to produce better code for >>> x86-64 than hand-tuned i586. Wider registers, more registers, >>> crypto acceleration instructions and SIMD instructions are all >>> very nice to have. I don't know the specifics of AES, though, or >>> what kind of crypto algorithm it is, so it's entirely possible >>> that one can't effectively parallelize it except in some >>> relatively unique circumstances. >> >> Do you have much experience of writing assembler? >> >> I don't, and I'm not an expert on this, but I've read the odd blog >> article on this subject over the years. > > Similar level of experience here. I can read it, even debug it from > time to time. A few regular bloggers on the subject are like candy. > And I used to have pagetable.org, Ars's Technopaedia and specsheets > for early x86 and motorola processors memorized. For the past couple > years, I've been focusing on reading blogs of language and compiler > authors, academics involved in proofing, testing and improving them, > etc. > >> >> What I've read often has the programmer looking at the compiled gcc >> bytecode and examining what it does. The compiler might not care >> how many registers it uses, and thus a variable might find itself >> frequently swapped back into RAM; the programmer does not have any >> control over the compiler, and IIRC some flags reserve a register >> for degugging (IIRC -fomit-frame-pointer disables this). I think >> it's possible to use registers more efficiently by swapping them >> (??) or by using bitwise comparisons and other tricks. > > Sure; it's cheaper to null out a register by XORing it with itself > than setting it to 0. > >> >> Assembler optimisation is only used on sections of code that are at >> the core of a loop - that are called hundreds or thousands (even >> millions?) of times during the program's execution. It's not for >> code, such as reading the .config file or initialisation, which is >> only called once. Because the code in the core of the loop is >> called so often, you don't have to achieve much of an optimisation >> for the aggregate to be much more considerable. > > Sure; optimize the hell out of the code where you spend most of your > time. I wasn't aware that gcc passed up on safe optimization > opportunities, though. > >> >> The operations in question may only be constitute a few lines of C, >> or a handful of machine operations, so it boils down to an >> algorithm that a human programmer is capable of getting a grip on >> and comprehending. Whilst compilers are clearly more efficient for >> large programs, on this micro scale, humans are more clever and >> creative than machines. > > I disagree. With defined semantics for the source and target, a > computer's cleverness is limited only by the computational and > memory expense of its search algorithms. Humans get through this by > making habit various optimizations, but those habits become less > useful as additional paths and instructions are added. As system > complexity increases, humans operate on personally cached techniques > derived from simpler systems. I would expect very, very few people to > be intimately familiar with the the majority of optimization > possibilities present on an amdfam10 processor or a core2. Compiler's > aren't necessarily familiar with them, either; they're just quicker > at discovering them, given knowledge of the individual instructions > and the rules of language semantics. > >> >> Encryption / decryption is an example of code that lends itself to >> this kind of optimisation. In particular AES was designed, I >> believe, to be amenable to implementation in this way. The reason >> for that was that it was desirable to have it run on embedded >> devices and on dedicated chips. So it boils down to a simple >> bitswap operation (??) - the plaintext is modified by the >> encryption key, input and output as a fast stream. Each byte goes >> in, each byte goes out, the same function performed on each one. > > I'd be willing to posit that you're right here, though if there > isn't a per-byte feedback mechanism, SIMD instructions would come > into serious play. But I expect there's a per-byte feedback > mechanism, so parallelization would likely come in the form of > processing simultaneous streams. > >> >> Another operation that lends itself to assembler optimisation is >> video decoding - the video is encoded only once, and then may be >> played back hundreds or millions of times by different people. The >> same operations must be repeated a number of times on each frame, >> then c 25 - 60 frames are decoded per second, so at least 90,000 >> frames per hour. Again, the smallest optimisation is worthwhile. > > Absolutely. My position, though, is that compilers are quicker and > more capable of discovering optimization possibilities than humans > are, when the target architecture changes. Sure, you've got several > dozen video codecs in, say, ffmpeg, and perhaps it all boils down to > less than a dozen very common cases of inner loop code. With > hand-tuned optimization, you'd need to fork your assembly patch for > each new processor feature that comes out, and then work to find the > most efficient way to execute code on that processor. > > There's also cases where processor features get changed. I don't > remember the name of the instruction (it had something to do with > stack operations) in x86, but Intel switched it from a 0-cycle > instruction to something more expensive. Any code which assumed that > instruction was a 0-cycle instruction now became less efficient. A > compiler (presuming it has a knowledge of the target processor's > instruction set properties) would have an easier time coping with > that change than a human would. > > I'm not saying humans are useless; this is just one of those areas > which is sufficiently complex-yet-deterministic that sufficient > knowledge of the source and target environments would give a > computer the edge over a human in finding the optimal sequence of > CPU instructions. > This thread is becoming ridiculously long. Just as a last side-note: One of the primary reasons that the IA64 architecture failed was that it relied on the compiler to optimize the code in order to exploit the massive instruction-level parallelism the CPU offered. Compilers never became good enough for the job. Of course, that happended in the nineties and we have much better compilers now (and x86 is easier to handle for compilers). But on the other hand: That was Intel's next big thing and if they couldn't make the compilers work, I have no reason to believe in their efficiency now. Regards, Florian Philipp [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 20:15 ` Florian Philipp @ 2012-03-13 20:22 ` Florian Philipp 0 siblings, 0 replies; 22+ messages in thread From: Florian Philipp @ 2012-03-13 20:22 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 863 bytes --] > > This thread is becoming ridiculously long. Just as a last side-note: > > One of the primary reasons that the IA64 architecture failed was that it > relied on the compiler to optimize the code in order to exploit the > massive instruction-level parallelism the CPU offered. Compilers never > became good enough for the job. Of course, that happended in the > nineties and we have much better compilers now (and x86 is easier to > handle for compilers). But on the other hand: That was Intel's next big > thing and if they couldn't make the compilers work, I have no reason to > believe in their efficiency now. > > Regards, > Florian Philipp Argh, just as I want to quit: I had the dates garbled up. IA64 came out in 2001 but the compiler design was of course a product of the late nineties and the design process started mid-nineties. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] hard drive encryption 2012-03-13 19:07 ` Stroller 2012-03-13 19:38 ` Michael Mol @ 2012-03-13 20:02 ` Florian Philipp 1 sibling, 0 replies; 22+ messages in thread From: Florian Philipp @ 2012-03-13 20:02 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 4865 bytes --] Am 13.03.2012 20:07, schrieb Stroller: > > On 13 March 2012, at 18:18, Michael Mol wrote: >> ... >>> So I assume the i586 version is better for you --- unless GCC >>> suddenly got a lot better at optimizing code. >> >> Since when, exactly? GCC isn't the best compiler at optimization, >> but I fully expect current versions to produce better code for >> x86-64 than hand-tuned i586. Wider registers, more registers, >> crypto acceleration instructions and SIMD instructions are all very >> nice to have. I don't know the specifics of AES, though, or what >> kind of crypto algorithm it is, so it's entirely possible that one >> can't effectively parallelize it except in some relatively unique >> circumstances. > > Do you have much experience of writing assembler? > > I don't, and I'm not an expert on this, but I've read the odd blog > article on this subject over the years. > > What I've read often has the programmer looking at the compiled gcc > bytecode and examining what it does. The compiler might not care how > many registers it uses, and thus a variable might find itself > frequently swapped back into RAM; the programmer does not have any > control over the compiler, and IIRC some flags reserve a register for > degugging (IIRC -fomit-frame-pointer disables this). I think it's > possible to use registers more efficiently by swapping them (??) or > by using bitwise comparisons and other tricks. > You recall correctly about the frame pointer. Concerning the register usage: I'm no expert in this field, either, but I think the main issue is not simply register allocation but branch and exception prediction and so on. The compiler can either optimize for a seamless continuation if the jump happens or if it doesn't. A human or a just-in-time compiler can better handle these cases by predicting the outcome of -- in the case of a JIT -- analyze the outcome of the first few iterations. OT: IIRC, register reuse is also the main performance problem of state-of-the-art javascript engines, at the moment. Concerning the code they compile at runtime, they are nearly as good as `gcc -O0` but they have the same problem concerning registers (GCC with -O0 produces code that works exactly as you describe above: Storing the result after every computation and loading it again). > Assembler optimisation is only used on sections of code that are at > the core of a loop - that are called hundreds or thousands (even > millions?) of times during the program's execution. It's not for > code, such as reading the .config file or initialisation, which is > only called once. Because the code in the core of the loop is called > so often, you don't have to achieve much of an optimisation for the > aggregate to be much more considerable. > > The operations in question may only be constitute a few lines of C, > or a handful of machine operations, so it boils down to an algorithm > that a human programmer is capable of getting a grip on and > comprehending. Whilst compilers are clearly more efficient for large > programs, on this micro scale, humans are more clever and creative > than machines. > > Encryption / decryption is an example of code that lends itself to > this kind of optimisation. In particular AES was designed, I believe, > to be amenable to implementation in this way. The reason for that was > that it was desirable to have it run on embedded devices and on > dedicated chips. So it boils down to a simple bitswap operation (??) > - the plaintext is modified by the encryption key, input and output > as a fast stream. Each byte goes in, each byte goes out, the same > function performed on each one. > Well, sort of. First of, you are right, AES was designed with hardware implementations in mind. The algorithm boils down to a number of substitution and permutation networks and XOR operations (I assume that's what you meant with byte swap). If you look at the portable C code (/usr/src/linux/crypto/aes_generic.c), you can see that it mostly consists of lookup tables and XORs. The thing about "each byte goes in, each byte goes out", however, is a bit wrong. What you think of is a stream cipher like RC4. AES is a block cipher. These use an (in this case 128 bit long) input string and XOR it with the encryption (sub-)key and shuffle it around according to the exact algorithm. > Another operation that lends itself to assembler optimisation is > video decoding - the video is encoded only once, and then may be > played back hundreds or millions of times by different people. The > same operations must be repeated a number of times on each frame, > then c 25 - 60 frames are decoded per second, so at least 90,000 > frames per hour. Again, the smallest optimisation is worthwhile. > > Stroller. > > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2012-03-13 21:06 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-11 15:38 [gentoo-user] hard drive encryption Valmor de Almeida 2012-03-11 18:29 ` Florian Philipp 2012-03-13 11:55 ` Valmor de Almeida 2012-03-13 16:11 ` Florian Philipp 2012-03-13 16:26 ` Michael Mol 2012-03-13 16:49 ` Florian Philipp 2012-03-13 16:54 ` Neil Bothwick 2012-03-13 16:54 ` Michael Mol 2012-03-13 17:45 ` Frank Steinmetzger 2012-03-13 18:06 ` Florian Philipp 2012-03-13 18:18 ` Michael Mol 2012-03-13 18:58 ` Florian Philipp 2012-03-13 19:13 ` Michael Mol 2012-03-13 19:30 ` Florian Philipp 2012-03-13 19:42 ` Michael Mol 2012-03-13 19:18 ` Florian Philipp 2012-03-13 21:05 ` Frank Steinmetzger 2012-03-13 19:07 ` Stroller 2012-03-13 19:38 ` Michael Mol 2012-03-13 20:15 ` Florian Philipp 2012-03-13 20:22 ` Florian Philipp 2012-03-13 20:02 ` Florian Philipp
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox