From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lists.gentoo.org ([140.105.134.102] helo=robin.gentoo.org) by nuthatch.gentoo.org with esmtp (Exim 4.43) id 1EK6VB-0007th-Vr for garchives@archives.gentoo.org; Tue, 27 Sep 2005 03:51:34 +0000 Received: from robin.gentoo.org (localhost [127.0.0.1]) by robin.gentoo.org (8.13.5/8.13.5) with SMTP id j8R3hkr9028904; Tue, 27 Sep 2005 03:43:46 GMT Received: from xproxy.gmail.com (xproxy.gmail.com [66.249.82.206]) by robin.gentoo.org (8.13.5/8.13.5) with ESMTP id j8R3e7Um021304 for ; Tue, 27 Sep 2005 03:40:08 GMT Received: by xproxy.gmail.com with SMTP id s8so916148wxc for ; Mon, 26 Sep 2005 20:47:05 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:disposition-notification-to:date:from:reply-to:user-agent:x-accept-language:mime-version:to:subject:x-enigmail-version:content-type:content-transfer-encoding; b=mFxY+k0cjFtJZ/0B5DNM+vpyL9Q6Ch3+7+m5dq3We5Q9tiVk3wKxKlxv/Cq0itgbfIoyH/myXOeR1k5igcIocW0d8HAYZAKJyfTJI6ZPm9L/j0fCRVJ2uPaMgqJxQ3kF7ber2I4dyHaWWQRLQbhXOqNfcWxqMN00FzJ7v544Dts= Received: by 10.70.11.20 with SMTP id 20mr932871wxk; Mon, 26 Sep 2005 20:47:05 -0700 (PDT) Received: from ?192.168.0.102? ( [63.207.177.3]) by mx.gmail.com with ESMTP id i11sm251484wxd.2005.09.26.20.47.04; Mon, 26 Sep 2005 20:47:05 -0700 (PDT) Message-ID: <4338C064.3090207@gmail.com> Date: Mon, 26 Sep 2005 20:45:40 -0700 From: gentuxx User-Agent: Mozilla Thunderbird 1.0.6 (X11/20050803) X-Accept-Language: en-us, en Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 To: gentoo-user@lists.gentoo.org Subject: [gentoo-user] Any 'sed' geniuses out there? X-Enigmail-Version: 0.92.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Archives-Salt: e85a738c-767c-442c-8e61-329feac21211 X-Archives-Hash: 868927b14da7ab5ef0deddc51045dd97 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm writing a sed script that will parse the *broken* output of man2html. I say broken, because the output isn't W3C compliant (html OR xhtml). I'd like to be able to modify it so that the final outcome is XHTML 1.0 compliant. I'm running into a problem where the output doesn't close the

,

, or
tags. XHTML requires that tags containing text be closed. So the problem I'm having is being able to take note of the starting tag, grab the subsequent paragraph, then insert the closing tag. What I've got /sort of/ works, but still not. Here's a sample that has been parsed, but not with the

modifying elements:

Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel, and copyright by the University of Cambridge, England. See http://www.pcre.org/ .

Nmap can optionally link to the OpenSSL cryptography toolkit, which is available from http://www.openssl.org/ . Here's the entire sedscr (sans comments): /^$/{ N /^\n$/d } /^Content-type: text\/html/c\ s%<\(HTML\|P\|HEAD\|TITLE\|BODY\|STRONG\|EM\|H[123456]\|D[DLT]\|T[TDRH]\)>%\L<\1>%g s%<\/\(HTML\|P\|A\|HEAD\|TITLE\|BODY\|STRONG\|EM\|H[123456]\|D[DLT]\|T[TDRH]\)>%\L%g s%
%
%g s%


%
%g s%<[Dd][Ll] [Cc][Oo][Mm][Pp][Aa][Cc][Tt]>%
% s%%%g s%%%g /^<[IB]>.*$/{ N s%\(<[IB]>\)\(.*\)\(<\/[IB]>\)\n%\L\1\2\L\3% } /^<[ib]>.*$/{ N s%\n%% } s%<[IB]>%\L&% s%<\/[IB]>%\L&% //,/<\/body>/{ /

/!{ H d } /

/{ x s/$/<\/p>/ G } } /^

$/,/<\p>$/{ N /^\n

$/d } Here's the funkiness after parsing with the last part (//,/<\/body>/{) enabled:

Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel, and copyright by the University of Cambridge, England. See http://www.pcre.org/ .

Nmap can optionally link to the OpenSSL cryptography toolkit, which is available from http://www.openssl.org/ .

(Just in case you were wondering, this IS from the nmap man page. ;-) Thanks. - -- gentux echo "hfouvyAdpy/ofu" | perl -pe 's/(.)/chr(ord($1)-1)/ge' gentux's gpg fingerprint ==> 34CE 2E97 40C7 EF6E EC40 9795 2D81 924A 6996 0993 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFDOMBkLYGSSmmWCZMRAnnrAJwKNqr+/OgBdDD8X8PXX6rpKUfaxQCfU9PW Bs2oA/76RYFbbc7DWEpfTM8= =gcc/ -----END PGP SIGNATURE----- -- gentoo-user@gentoo.org mailing list