* Re: [gentoo-user] [OT] Question about duplicate lines in file
2006-06-12 17:19 ` Teresa and Dale
@ 2006-06-12 17:32 ` Matthew Cline
2006-06-12 17:37 ` Raymond Lewis Rebbeck
` (3 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Matthew Cline @ 2006-06-12 17:32 UTC (permalink / raw
To: gentoo-user
On 6/12/06, Teresa and Dale <teendale@vista-express.com> wrote:
>
> Thanks, read the man page, it was short so it didn't take long. I tried
> this:
>
> uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort
>
I think that you need to run sort on the file first, then uniq.
HTH,
Matt
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [gentoo-user] [OT] Question about duplicate lines in file
2006-06-12 17:19 ` Teresa and Dale
2006-06-12 17:32 ` Matthew Cline
@ 2006-06-12 17:37 ` Raymond Lewis Rebbeck
2006-06-12 17:37 ` Neil Bothwick
` (2 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Raymond Lewis Rebbeck @ 2006-06-12 17:37 UTC (permalink / raw
To: gentoo-user
On Tuesday, 13 June 2006 2:49, Teresa and Dale wrote:
> Raymond Lewis Rebbeck wrote:
> >On Tuesday, 13 June 2006 2:12, Teresa and Dale wrote:
> >>Hi folks,
> >>
> >>I have batched a bunch of servers in my hosts file to block, for ads and
> >>all that crap. I got them from several different places, some I have
> >>found too, and am sure there are dups in there, same server but pasted
> >>from several sources. I am not a programer at all and don't even really
> >>know what to search for. I would like to remove the duplicate entries
> >>and then put them in alphabetical order if I could. I would gladly then
> >>make this available if someone wanted to host it. I don't have a place
> >>to host it.
> >>
> >>Oh, there is 15,000 entries in my hosts file. O_O
> >>
> >>Could someone tell me how this is done? May even learn something here.
> >>If I can do this, I'm sure I will.
> >>
> >>Thanks.
> >>
> >>Dale
> >>
> >>:-) :-)
> >
> >'uniq' and 'sort' should do what you're after, check out the man pages.
>
> Thanks, read the man page, it was short so it didn't take long. I tried
> this:
>
> uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort
>
> It doesn't look like it did anything but copy the same thing over.
> There are only 2 lines missing. Does spaces count? Some put in a lot
> of spaces between the localhost and the web address. Maybe that has a
> affect??
>
> Thanks for the help. I had never seen that command before. I had heard
> of sort, never used it though. I do have those on my desktop. I'm
> playing with copies instead of my real hosts file.
>
> Thanks again.
>
> Dale
>
> :-) :-)
Yes the spaces matter, you could possibly use 'tr' to turn all repeated spaces
into a single space.
$ tr -s ' ' < filename
That should do it, then you can pipe it through uniq and sort and do whatever
else you want with it.
--
Raymond Lewis Rebbeck
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [gentoo-user] [OT] Question about duplicate lines in file
2006-06-12 17:19 ` Teresa and Dale
2006-06-12 17:32 ` Matthew Cline
2006-06-12 17:37 ` Raymond Lewis Rebbeck
@ 2006-06-12 17:37 ` Neil Bothwick
2006-06-12 17:45 ` Mike Williams
2006-06-12 17:55 ` [gentoo-user] " Christer Ekholm
4 siblings, 0 replies; 12+ messages in thread
From: Neil Bothwick @ 2006-06-12 17:37 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 384 bytes --]
On Mon, 12 Jun 2006 12:19:46 -0500, Teresa and Dale wrote:
> uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort
uniq only removes consecutive duplicate line, you need to use sort first
sort file | uniq >newfile
or, possibly, depending on the format of your file
sort -u file >newfile
--
Neil Bothwick
Few women admit their age. Few men act theirs.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [gentoo-user] [OT] Question about duplicate lines in file
2006-06-12 17:19 ` Teresa and Dale
` (2 preceding siblings ...)
2006-06-12 17:37 ` Neil Bothwick
@ 2006-06-12 17:45 ` Mike Williams
2006-06-12 17:55 ` [gentoo-user] " Christer Ekholm
4 siblings, 0 replies; 12+ messages in thread
From: Mike Williams @ 2006-06-12 17:45 UTC (permalink / raw
To: gentoo-user
On Monday 12 June 2006 18:19, Teresa and Dale wrote:
> Thanks, read the man page, it was short so it didn't take long. I tried
> this:
sort would be more appropriate. I don't believe uniq will find matches
anywhere in the file, i.e.
192
195
192
wouldn't get shortened, but
192
192
195
would.
--
Mike Williams
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 12+ messages in thread
* [gentoo-user] Re: [OT] Question about duplicate lines in file
2006-06-12 17:19 ` Teresa and Dale
` (3 preceding siblings ...)
2006-06-12 17:45 ` Mike Williams
@ 2006-06-12 17:55 ` Christer Ekholm
2006-06-12 18:39 ` Alan McKinnon
4 siblings, 1 reply; 12+ messages in thread
From: Christer Ekholm @ 2006-06-12 17:55 UTC (permalink / raw
To: gentoo-user; +Cc: Teresa and Dale
Teresa and Dale <teendale@vista-express.com> writes:
>
>
> Thanks, read the man page, it was short so it didn't take long. I tried
> this:
>
> uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort
>
> It doesn't look like it did anything but copy the same thing over.
> There are only 2 lines missing. Does spaces count? Some put in a lot
> of spaces between the localhost and the web address. Maybe that has a
> affect??
The problem with uniq is that it (according to the manpage),
"Discard all but one of successive identical lines"
You need to have a sorted file for uniq to do what you want, or sort
it with the -u option
sort -u hosts > hostsort
If you don't want to ruin your original order you have to do something
else. This is one way of doing it with perl.
perl -ne 'print unless exists $h{$_}; $h{$_} = 1' hosts > hostsort
--
Christer
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [gentoo-user] Re: [OT] Question about duplicate lines in file
2006-06-12 17:55 ` [gentoo-user] " Christer Ekholm
@ 2006-06-12 18:39 ` Alan McKinnon
2006-06-12 19:15 ` Neil Bothwick
0 siblings, 1 reply; 12+ messages in thread
From: Alan McKinnon @ 2006-06-12 18:39 UTC (permalink / raw
To: gentoo-user
On Monday 12 June 2006 19:55, Christer Ekholm wrote:
> Teresa and Dale <teendale@vista-express.com> writes:
> > Thanks, read the man page, it was short so it didn't take long.
> > I tried this:
> >
> > uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort
> >
> > It doesn't look like it did anything but copy the same thing
> > over. There are only 2 lines missing. Does spaces count? Some
> > put in a lot of spaces between the localhost and the web address.
> > Maybe that has a affect??
>
> The problem with uniq is that it (according to the manpage),
>
> "Discard all but one of successive identical lines"
>
> You need to have a sorted file for uniq to do what you want, or
> sort it with the -u option
>
> sort -u hosts > hostsort
>
> If you don't want to ruin your original order you have to do
> something else. This is one way of doing it with perl.
>
> perl -ne 'print unless exists $h{$_}; $h{$_} = 1' hosts >
> hostsort
Almost there :-)
If /etc/hosts has these lines:
127.0.0.1 localhost
127.0.0.1 localhost
uniq will see these as different even though they are actually the
same entry. So he needs something like tr to squash spaces. This will
do it (as root):
cat /etc/hosts | tr -s ' ' | sort | uniq -i > /etc/hosts.new
If the new file is OK, use it to overwrite /etc/hosts
Explanation so Dale knows what I'm asking him to do:
cat send the file to tr
tr finds all cases of two or more consecutive spaces and replaces them
with one space
sort does a sort
uniq finds consecutive lines that are the same and throws away the
extra ones. The -i is there just in case two entries differ in case
only (as FQDNs are strictly speaking case insensitive). As mentioned
by others, uniq only matches consecutive dupes, so the list must be
sorted first
> /etc/hosts.new writes the final output to the named disk file
Cheers,
alan
p.s. Those 15,000 entries in your hosts file are, um, a lot :-)
--
If only me, you and dead people understand hex,
how many people understand hex?
Alan McKinnon
alan at linuxholdings dot co dot za
+27 82, double three seven, one nine three five
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [gentoo-user] Re: [OT] Question about duplicate lines in file
2006-06-12 18:39 ` Alan McKinnon
@ 2006-06-12 19:15 ` Neil Bothwick
2006-06-12 22:52 ` Teresa and Dale
0 siblings, 1 reply; 12+ messages in thread
From: Neil Bothwick @ 2006-06-12 19:15 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 643 bytes --]
On Mon, 12 Jun 2006 20:39:20 +0200, Alan McKinnon wrote:
> If /etc/hosts has these lines:
> 127.0.0.1 localhost
> 127.0.0.1 localhost
> uniq will see these as different even though they are actually the
> same entry. So he needs something like tr to squash spaces. This will
> do it (as root):
>
> cat /etc/hosts | tr -s ' ' | sort | uniq -i > /etc/hosts.new
sort -u -k1,1 /etc/hosts >/etc/hosts.new
avoids the need to use cat, uniq or tr. -k1,1 sorts on the first field
(space delimited) and -u remove lines where the sort field is the same.
--
Neil Bothwick
Please rotate your phone 90 degrees and try again.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [gentoo-user] Re: [OT] Question about duplicate lines in file
2006-06-12 19:15 ` Neil Bothwick
@ 2006-06-12 22:52 ` Teresa and Dale
2006-06-12 23:23 ` Neil Bothwick
0 siblings, 1 reply; 12+ messages in thread
From: Teresa and Dale @ 2006-06-12 22:52 UTC (permalink / raw
To: gentoo-user
Neil Bothwick wrote:
>On Mon, 12 Jun 2006 20:39:20 +0200, Alan McKinnon wrote:
>
>
>
>>If /etc/hosts has these lines:
>>127.0.0.1 localhost
>>127.0.0.1 localhost
>>uniq will see these as different even though they are actually the
>>same entry. So he needs something like tr to squash spaces. This will
>>do it (as root):
>>
>>cat /etc/hosts | tr -s ' ' | sort | uniq -i > /etc/hosts.new
>>
>>
>
>sort -u -k1,1 /etc/hosts >/etc/hosts.new
>
>avoids the need to use cat, uniq or tr. -k1,1 sorts on the first field
>(space delimited) and -u remove lines where the sort field is the same.
>
>
>
>
Well that removed a few, all of them to be exact. The file was blank.
O_O LOL I'm learning though.
Dale
:-)
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [gentoo-user] Re: [OT] Question about duplicate lines in file
2006-06-12 22:52 ` Teresa and Dale
@ 2006-06-12 23:23 ` Neil Bothwick
0 siblings, 0 replies; 12+ messages in thread
From: Neil Bothwick @ 2006-06-12 23:23 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 639 bytes --]
On Mon, 12 Jun 2006 17:52:21 -0500, Teresa and Dale wrote:
> >sort -u -k1,1 /etc/hosts >/etc/hosts.new
> >
> >avoids the need to use cat, uniq or tr. -k1,1 sorts on the first field
> >(space delimited) and -u remove lines where the sort field is the same.
> Well that removed a few, all of them to be exact. The file was blank.
> O_O LOL I'm learning though.
What's the format of the file? If it's a standard /etc/hosts layout, this
will removed duplicates based on IP address, but if you have another
field first, you need to change the key.
--
Neil Bothwick
"Bother," said Pooh, as he connected at 300 bps.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread