public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] RFC: sed script redundancy
@ 2011-05-20 15:39 Jeroen Roovers
  2011-05-20 15:56 ` Fabian Groffen
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jeroen Roovers @ 2011-05-20 15:39 UTC (permalink / raw
  To: gentoo-dev

      Hullo developers,


for a while now I've been wondering if all those sed scripts in all
those ebuilds are really effective.

To find out, I've tried a couple of angles on a sed hook that basically
dissects the sed command line provided, divides everything up into sed
scripts, files being processed and other options, and runs everything
through diff to get some meaningful QA output as to the effective use
of the sed scripts invoked.

Of course some of the time a sed script falsely seems to be ineffective,
but could be, when it uses some variable or output that varies depending
on the platform you run it on, like with the likes of $(get_libdir).

I've looked into sed's internal solutions to no avail, but something
like -i[SUFFIX] might help, since it gives you a backup file to compare
with the file that's being streamed.

The idea is to pass the result to
  | diff -u $file $file[SUFFIX]
to figure out what was changed, and what sed script changed it.

Any help?


     jer


PS: Because the outcome may depend on the platform you run the scripts
on, this probably shouldn't make it into a QA test in portage, but it
could still help developers evaluate how effective their ebuilds' and
eclasses' sed scripts are.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] RFC: sed script redundancy
  2011-05-20 15:39 [gentoo-dev] RFC: sed script redundancy Jeroen Roovers
@ 2011-05-20 15:56 ` Fabian Groffen
  2011-05-21 17:34   ` Jeroen Roovers
  2011-10-28  1:08 ` [gentoo-dev] " Ryan Hill
  2013-06-10  3:47 ` [gentoo-dev] " Jeroen Roovers
  2 siblings, 1 reply; 9+ messages in thread
From: Fabian Groffen @ 2011-05-20 15:56 UTC (permalink / raw
  To: gentoo-dev

On 20-05-2011 17:39:22 +0200, Jeroen Roovers wrote:
> I've looked into sed's internal solutions to no avail, but something
> like -i[SUFFIX] might help, since it gives you a backup file to compare
> with the file that's being streamed.
> 
> The idea is to pass the result to
>   | diff -u $file $file[SUFFIX]
> to figure out what was changed, and what sed script changed it.

I like your idea a lot.

I had to do a similar thing once in eapify, and back then I came up with
the following hack:

  sed -e "<pattern>" "${file}" | diff "${file}" -

followed by the actual sed -i -e ...

This way I didn't need to write an intermediate file.


-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] RFC: sed script redundancy
  2011-05-20 15:56 ` Fabian Groffen
@ 2011-05-21 17:34   ` Jeroen Roovers
  2011-05-22 10:50     ` Fabian Groffen
  0 siblings, 1 reply; 9+ messages in thread
From: Jeroen Roovers @ 2011-05-21 17:34 UTC (permalink / raw
  To: gentoo-dev

On Fri, 20 May 2011 17:56:00 +0200
Fabian Groffen <grobian@gentoo.org> wrote:

>   sed -e "<pattern>" "${file}" | diff "${file}" -
> 
> followed by the actual sed -i -e ...
> 
> This way I didn't need to write an intermediate file.

The problem there is that sed might be called just once on any one file,
but in the tree it is often invoked with multiple scripts, so this
simple implementation lacks a way to evaluate which sed scripts are
useful.

Also, how do I ensure the sed replacement works only on invocations
inside the ebuild, and not, say, in portage's internals?


     jer



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] RFC: sed script redundancy
  2011-05-21 17:34   ` Jeroen Roovers
@ 2011-05-22 10:50     ` Fabian Groffen
  2011-05-29 10:44       ` Christopher Schwan
  0 siblings, 1 reply; 9+ messages in thread
From: Fabian Groffen @ 2011-05-22 10:50 UTC (permalink / raw
  To: gentoo-dev

On 21-05-2011 19:34:34 +0200, Jeroen Roovers wrote:
> On Fri, 20 May 2011 17:56:00 +0200
> Fabian Groffen <grobian@gentoo.org> wrote:
> 
> >   sed -e "<pattern>" "${file}" | diff "${file}" -
> > 
> > followed by the actual sed -i -e ...
> > 
> > This way I didn't need to write an intermediate file.
> 
> The problem there is that sed might be called just once on any one file,
> but in the tree it is often invoked with multiple scripts, so this
> simple implementation lacks a way to evaluate which sed scripts are
> useful.
> 
> Also, how do I ensure the sed replacement works only on invocations
> inside the ebuild, and not, say, in portage's internals?

(not tested, but as proof of concept)

alias sed my_sed
my_sed() {
	local oargs="${@}"
	local arg
	local nargs=()
	local hadi=
	local hade=
	while [[ -n $1 ]] ; do
		case "$1" in
			-i)
				# ignore this flag
				hadi=yes
				;;
			-e|-f)
				shift
				nargs+=( "-e$1" )
				hade=yes
				;;
			-*)
				nargs+=( "$1" )
				hade=yes
				;;
			*)
				if [[ -z ${hade} ]] ; then
					nargs+=( "$1" )
				elif [[ -z ${hadi} ]] ; then
					# there is no inline replacing, not much we can do
					break
				else
					sed "${nargs[@]}" "$1" | diff -q "$1" - > /dev/null \
						&& ewarn "sed ${oargs} has no effect on $1" 
				fi
				;;
		esac
		shift
	done

	\sed "${oargs}"
}


-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] RFC: sed script redundancy
  2011-05-22 10:50     ` Fabian Groffen
@ 2011-05-29 10:44       ` Christopher Schwan
  2011-05-29 11:00         ` Fabian Groffen
  0 siblings, 1 reply; 9+ messages in thread
From: Christopher Schwan @ 2011-05-29 10:44 UTC (permalink / raw
  To: gentoo-dev

Thank you for that script. I experimented a bit with it and have a number of 
corrections and suggestions:

- alias does not work because my_sed is not declared at this stage. I removed 
the whole alias line because I want to selectively enable my_sed
- oargs must be an array in order to make quoting work:

   local oargs=( "${@}" )

- In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?)
- is it correct to treat -e and -f alike ? I am not sure about that, because 
the latter expects a file
- If no "-e" is given, the first non-option argument is treated as the sed-
script-expression, therefore I added hade=yes in the if-branch

The new function now reads:

my_sed() {
	local oargs=( "$@" )
	local arg
	local nargs=()
	local hadi=
	local hade=

	while [[ -n $1 ]] ; do
		case "$1" in
		-i|--in-place)
			# ignore this flag
			hadi=yes
			;;
		-e|--expression)
			shift
			nargs+=( "-e" "$1" )
			hade=yes
			;;
		-f|--file)
			shift
			nargs+=( "-f" "$1" )
			hade=yes
			;;
		-*)
			nargs+=( "$1" )
			;;
		*)
			if [[ -z ${hade} ]] ; then
				nargs+=( "-e" "$1" )
				hade=yes
			elif [[ -z ${hadi} ]] ; then
				# there is no inline replacing, not much we can do
				break
			else
				sed "${nargs[@]}" "$1" | diff -q "$1" - > /dev/null && \
					ewarn "sed ${nargs[@]} has no effect on $1" 
			fi
			;;
		esac
		shift
	done

	sed "${oargs[@]}"
}

As you can see, I added support for long-options. However, testing the 
individual sed commands remains to be done. This could be especially difficult 
if input is taken from stdin (e.g. in cat foo | sed "s:a:b:g").

I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed 
commands and I was able to spot one useless sed.



[1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci-
mathematics/sage/sage-4.7.ebuild


On Sunday 22 May 2011 12:50:43 Fabian Groffen wrote:
> On 21-05-2011 19:34:34 +0200, Jeroen Roovers wrote:
> > On Fri, 20 May 2011 17:56:00 +0200
> > 
> > Fabian Groffen <grobian@gentoo.org> wrote:
> > >   sed -e "<pattern>" "${file}" | diff "${file}" -
> > > 
> > > followed by the actual sed -i -e ...
> > > 
> > > This way I didn't need to write an intermediate file.
> > 
> > The problem there is that sed might be called just once on any one file,
> > but in the tree it is often invoked with multiple scripts, so this
> > simple implementation lacks a way to evaluate which sed scripts are
> > useful.
> > 
> > Also, how do I ensure the sed replacement works only on invocations
> > inside the ebuild, and not, say, in portage's internals?
> 
> (not tested, but as proof of concept)
> 
> alias sed my_sed
> my_sed() {
> 	local oargs="${@}"
> 	local arg
> 	local nargs=()
> 	local hadi=
> 	local hade=
> 	while [[ -n $1 ]] ; do
> 		case "$1" in
> 			-i)
> 				# ignore this flag
> 				hadi=yes
> 				;;
> 			-e|-f)
> 				shift
> 				nargs+=( "-e$1" )
> 				hade=yes
> 				;;
> 			-*)
> 				nargs+=( "$1" )
> 				hade=yes
> 				;;
> 			*)
> 				if [[ -z ${hade} ]] ; then
> 					nargs+=( "$1" )
> 				elif [[ -z ${hadi} ]] ; then
> 					# there is no inline replacing, not much we can do
> 					break
> 				else
> 					sed "${nargs[@]}" "$1" | diff -q "$1" - > /dev/null \
> 						&& ewarn "sed ${oargs} has no effect on $1"
> 				fi
> 				;;
> 		esac
> 		shift
> 	done
> 
> 	\sed "${oargs}"
> }



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] RFC: sed script redundancy
  2011-05-29 10:44       ` Christopher Schwan
@ 2011-05-29 11:00         ` Fabian Groffen
  2011-05-29 11:49           ` Christopher Schwan
  0 siblings, 1 reply; 9+ messages in thread
From: Fabian Groffen @ 2011-05-29 11:00 UTC (permalink / raw
  To: gentoo-dev

On 29-05-2011 12:44:46 +0200, Christopher Schwan wrote:
> Thank you for that script. I experimented a bit with it and have a number of 
> corrections and suggestions:
> 
> - alias does not work because my_sed is not declared at this stage. I removed 
> the whole alias line because I want to selectively enable my_sed
> - oargs must be an array in order to make quoting work:
> 
>    local oargs=( "${@}" )
> 
> - In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?)
> - is it correct to treat -e and -f alike ? I am not sure about that, because 
> the latter expects a file

Yes, because (also in your function) you always shift, and assume the
next argument is there.  Hence, you have two identical cases in your
script now.  I only distinguised between 1) being able to do something
(-i) and 2) having a pattern to work with (-e/-f or first non-option
argument as string pattern).

> - If no "-e" is given, the first non-option argument is treated as the sed-
> script-expression, therefore I added hade=yes in the if-branch

That one was missing indeed.  I just quickly wrote the proof of concept
:)

> The new function now reads:
> 
[snip improved function]
> 
> As you can see, I added support for long-options. However, testing the 
> individual sed commands remains to be done. This could be especially difficult 
> if input is taken from stdin (e.g. in cat foo | sed "s:a:b:g").

You might be able to detect input is a pipe, and temporarily
write the input to some file, then perform the sed without the -i
requirement and remove the temp file after the real sed.

> I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed 
> commands and I was able to spot one useless sed.

Cool, nice to see you've made it into something useful!

> [1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci-
> mathematics/sage/sage-4.7.ebuild


-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] RFC: sed script redundancy
  2011-05-29 11:00         ` Fabian Groffen
@ 2011-05-29 11:49           ` Christopher Schwan
  0 siblings, 0 replies; 9+ messages in thread
From: Christopher Schwan @ 2011-05-29 11:49 UTC (permalink / raw
  To: gentoo-dev

On Sunday 29 May 2011 13:00:32 Fabian Groffen wrote:
> On 29-05-2011 12:44:46 +0200, Christopher Schwan wrote:
> > Thank you for that script. I experimented a bit with it and have a number
> > of corrections and suggestions:
> > 
> > - alias does not work because my_sed is not declared at this stage. I
> > removed the whole alias line because I want to selectively enable my_sed
> > 
> > - oargs must be an array in order to make quoting work:
> >    local oargs=( "${@}" )
> > 
> > - In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?)
> > - is it correct to treat -e and -f alike ? I am not sure about that,
> > because the latter expects a file
> 
> Yes, because (also in your function) you always shift, and assume the
> next argument is there.  Hence, you have two identical cases in your
> script now.  I only distinguised between 1) being able to do something
> (-i) and 2) having a pattern to work with (-e/-f or first non-option
> argument as string pattern).

Ok sorry - I did not express myself clearly. Your script is replacing "-f" 
with "-e" and I wasnt sure if this is correct. Buf of course, they can be 
treated alike:

-e|--expression|-f|--file)
	arg="$1"
	shift
	nargs+=( "${arg}" "$1" )
	hade=yes
	;;

> 
> > - If no "-e" is given, the first non-option argument is treated as the
> > sed- script-expression, therefore I added hade=yes in the if-branch
> 
> That one was missing indeed.  I just quickly wrote the proof of concept
> 
> :)
> :
> > The new function now reads:
> [snip improved function]
> 
> > As you can see, I added support for long-options. However, testing the
> > individual sed commands remains to be done. This could be especially
> > difficult if input is taken from stdin (e.g. in cat foo | sed
> > "s:a:b:g").
> 
> You might be able to detect input is a pipe, and temporarily
> write the input to some file, then perform the sed without the -i
> requirement and remove the temp file after the real sed.

Good idea, thanks!

> 
> > I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed
> > commands and I was able to spot one useless sed.
> 
> Cool, nice to see you've made it into something useful!
> 
> > [1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci-
> > mathematics/sage/sage-4.7.ebuild



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [gentoo-dev] Re: RFC: sed script redundancy
  2011-05-20 15:39 [gentoo-dev] RFC: sed script redundancy Jeroen Roovers
  2011-05-20 15:56 ` Fabian Groffen
@ 2011-10-28  1:08 ` Ryan Hill
  2013-06-10  3:47 ` [gentoo-dev] " Jeroen Roovers
  2 siblings, 0 replies; 9+ messages in thread
From: Ryan Hill @ 2011-10-28  1:08 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1772 bytes --]

On Fri, 20 May 2011 17:39:22 +0200
Jeroen Roovers <jer@gentoo.org> wrote:

> for a while now I've been wondering if all those sed scripts in all
> those ebuilds are really effective.
> 
> To find out, I've tried a couple of angles on a sed hook that basically
> dissects the sed command line provided, divides everything up into sed
> scripts, files being processed and other options, and runs everything
> through diff to get some meaningful QA output as to the effective use
> of the sed scripts invoked.
> 
> Of course some of the time a sed script falsely seems to be ineffective,
> but could be, when it uses some variable or output that varies depending
> on the platform you run it on, like with the likes of $(get_libdir).
> 
> I've looked into sed's internal solutions to no avail, but something
> like -i[SUFFIX] might help, since it gives you a backup file to compare
> with the file that's being streamed.
> 
> The idea is to pass the result to
>   | diff -u $file $file[SUFFIX]
> to figure out what was changed, and what sed script changed it.
> 
> Any help?

Sorry, old thread. :)  You can use the 'w' flag to write a log file of lines
changed.  This includes lines changed where the replacement ended up the same
as the text matched (eg. lib->$(get_libdir)).  Which means you can do
something like:

dirtyepic@tundra ~ $ cat test
foo
foobar
bar
foobarfoo
dirtyepic@tundra ~ $ sed -i -e 's:foo:foo:gw /dev/stdout' test | wc -l
3

I think only gnu sed can do the stdout thing.

-- 
fonts, gcc-porting,                  it makes no sense how it makes no sense
toolchain, wxwidgets                           but i'll take it free anytime
@ gentoo.org                EFFD 380E 047A 4B51 D2BD C64F 8AA8 8346 F9A4 0662

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-dev] RFC: sed script redundancy
  2011-05-20 15:39 [gentoo-dev] RFC: sed script redundancy Jeroen Roovers
  2011-05-20 15:56 ` Fabian Groffen
  2011-10-28  1:08 ` [gentoo-dev] " Ryan Hill
@ 2013-06-10  3:47 ` Jeroen Roovers
  2 siblings, 0 replies; 9+ messages in thread
From: Jeroen Roovers @ 2013-06-10  3:47 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 499 bytes --]

On Fri, 20 May 2011 17:39:22 +0200
Jeroen Roovers <jer@gentoo.org> wrote:

> for a while now I've been wondering if all those sed scripts in all
> those ebuilds are really effective.

This took rather long to find some spare time for.

Plonk the attached bash script into /etc/portage/bashrc.d (which you
source in /etc/portage/bashrc or something) and you should probably be
on your merry way.

The script probably doesn't catch all cases, but sed use in most
ebuilds is pretty standard.


     jer

[-- Attachment #2: sed --]
[-- Type: text/plain, Size: 1209 bytes --]

#!/bin/bash
SED=$(type -P sed)

sed() {
	case ${FUNCNAME[1]} in
		# we only care about direct calls in some ebuild phases
		src_prepare|src_configure|src_compile|src_install)
			local arg args file files scripts
			while [[ $# -gt 0 ]]; do
				arg="${1}"
				if [ -f "${arg}" ]; then
					files+=" ${arg}"
				else
					case ${arg} in
						-e) : ;;
						--expression=*) scripts+=( "${arg/--expression=}" ) ;;
						-i*|--in-place=*) : ;;
						-f)
							local scriptfile="${arg}"
							shift
							scriptfile+=" ${arg}"
							scripts+=( "${scriptfile}" )
						;;
						--file=*) scripts+=( "${arg/--file=}" ) ;;
						-*) args+=" ${arg}" ;;
						*) scripts+=( "${arg}" ) ;;
					esac
				fi
				shift
			done

			for file in ${files}; do
				for idx in ${!scripts[@]}; do
					${SED} ${args} -e "${scripts[@]:${idx}:1}" ${file} > ${file}.jer-qa-sed
					if cmp -s ${file}{,.jer-qa-sed}; then
						echo -e "\033[01;33m!!! JeR-QA: sed expression "${scripts[@]:${idx}:1}" did not change ${file}\033[00;00m" \
							| tee -a ${T}/sed.log > /dev/stderr
					else
						mv ${file}{.jer-qa-sed,}
					fi
				done
			done
		;;
		*) ${SED} "${@}"
		;;
		esac
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-06-10  3:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-20 15:39 [gentoo-dev] RFC: sed script redundancy Jeroen Roovers
2011-05-20 15:56 ` Fabian Groffen
2011-05-21 17:34   ` Jeroen Roovers
2011-05-22 10:50     ` Fabian Groffen
2011-05-29 10:44       ` Christopher Schwan
2011-05-29 11:00         ` Fabian Groffen
2011-05-29 11:49           ` Christopher Schwan
2011-10-28  1:08 ` [gentoo-dev] " Ryan Hill
2013-06-10  3:47 ` [gentoo-dev] " Jeroen Roovers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox