public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] [OT]  Question about duplicate lines in file
@ 2006-06-12 16:42 Teresa and Dale
  2006-06-12 16:54 ` Raymond Lewis Rebbeck
  0 siblings, 1 reply; 12+ messages in thread
From: Teresa and Dale @ 2006-06-12 16:42 UTC (permalink / raw
  To: gentoo-user

Hi folks,

I have batched a bunch of servers in my hosts file to block, for ads and
all that crap.  I got them from several different places, some I have
found too, and am sure there are dups in there, same server but pasted
from several sources.  I am not a programer at all and don't even really
know what to search for.  I would like to remove the duplicate entries
and then put them in alphabetical order if I could.  I would gladly then
make this available if someone wanted to host it.  I don't have a place
to host it. 

Oh, there is 15,000 entries in my hosts file.  O_O

Could someone tell me how this is done?  May even learn something here. 
If I can do this, I'm sure I will. 

Thanks.

Dale
:-)  :-)
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gentoo-user] [OT]  Question about duplicate lines in file
  2006-06-12 16:42 [gentoo-user] [OT] Question about duplicate lines in file Teresa and Dale
@ 2006-06-12 16:54 ` Raymond Lewis Rebbeck
  2006-06-12 17:19   ` Teresa and Dale
  0 siblings, 1 reply; 12+ messages in thread
From: Raymond Lewis Rebbeck @ 2006-06-12 16:54 UTC (permalink / raw
  To: gentoo-user

On Tuesday, 13 June 2006 2:12, Teresa and Dale wrote:
> Hi folks,
>
> I have batched a bunch of servers in my hosts file to block, for ads and
> all that crap.  I got them from several different places, some I have
> found too, and am sure there are dups in there, same server but pasted
> from several sources.  I am not a programer at all and don't even really
> know what to search for.  I would like to remove the duplicate entries
> and then put them in alphabetical order if I could.  I would gladly then
> make this available if someone wanted to host it.  I don't have a place
> to host it.
>
> Oh, there is 15,000 entries in my hosts file.  O_O
>
> Could someone tell me how this is done?  May even learn something here.
> If I can do this, I'm sure I will.
>
> Thanks.
>
> Dale
>
> :-)  :-)

'uniq' and 'sort' should do what you're after, check out the man pages.

-- 
Raymond Lewis Rebbeck
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gentoo-user] [OT]  Question about duplicate lines in file
  2006-06-12 16:54 ` Raymond Lewis Rebbeck
@ 2006-06-12 17:19   ` Teresa and Dale
  2006-06-12 17:32     ` Matthew Cline
                       ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Teresa and Dale @ 2006-06-12 17:19 UTC (permalink / raw
  To: gentoo-user

Raymond Lewis Rebbeck wrote:

>On Tuesday, 13 June 2006 2:12, Teresa and Dale wrote:
>  
>
>>Hi folks,
>>
>>I have batched a bunch of servers in my hosts file to block, for ads and
>>all that crap.  I got them from several different places, some I have
>>found too, and am sure there are dups in there, same server but pasted
>>from several sources.  I am not a programer at all and don't even really
>>know what to search for.  I would like to remove the duplicate entries
>>and then put them in alphabetical order if I could.  I would gladly then
>>make this available if someone wanted to host it.  I don't have a place
>>to host it.
>>
>>Oh, there is 15,000 entries in my hosts file.  O_O
>>
>>Could someone tell me how this is done?  May even learn something here.
>>If I can do this, I'm sure I will.
>>
>>Thanks.
>>
>>Dale
>>
>>:-)  :-)
>>    
>>
>
>'uniq' and 'sort' should do what you're after, check out the man pages.
>
>  
>


Thanks, read the man page, it was short so it didn't take long.  I tried
this:

uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort

It doesn't look like it did anything but copy the same thing over. 
There are only 2 lines missing.  Does spaces count?  Some put in a lot
of spaces between the localhost and the web address.  Maybe that has a
affect??

Thanks for the help.  I had never seen that command before.  I had heard
of sort, never used it though.  I do have those on my desktop.  I'm
playing with copies instead of my real hosts file.

Thanks again.

Dale
:-)  :-)
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gentoo-user] [OT] Question about duplicate lines in file
  2006-06-12 17:19   ` Teresa and Dale
@ 2006-06-12 17:32     ` Matthew Cline
  2006-06-12 17:37     ` Raymond Lewis Rebbeck
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Matthew Cline @ 2006-06-12 17:32 UTC (permalink / raw
  To: gentoo-user

On 6/12/06, Teresa and Dale <teendale@vista-express.com> wrote:
>
> Thanks, read the man page, it was short so it didn't take long.  I tried
> this:
>
> uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort
>

I think that you need to run sort on the file first, then uniq.

HTH,

Matt
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gentoo-user] [OT]  Question about duplicate lines in file
  2006-06-12 17:19   ` Teresa and Dale
  2006-06-12 17:32     ` Matthew Cline
@ 2006-06-12 17:37     ` Raymond Lewis Rebbeck
  2006-06-12 17:37     ` Neil Bothwick
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Raymond Lewis Rebbeck @ 2006-06-12 17:37 UTC (permalink / raw
  To: gentoo-user

On Tuesday, 13 June 2006 2:49, Teresa and Dale wrote:
> Raymond Lewis Rebbeck wrote:
> >On Tuesday, 13 June 2006 2:12, Teresa and Dale wrote:
> >>Hi folks,
> >>
> >>I have batched a bunch of servers in my hosts file to block, for ads and
> >>all that crap.  I got them from several different places, some I have
> >>found too, and am sure there are dups in there, same server but pasted
> >>from several sources.  I am not a programer at all and don't even really
> >>know what to search for.  I would like to remove the duplicate entries
> >>and then put them in alphabetical order if I could.  I would gladly then
> >>make this available if someone wanted to host it.  I don't have a place
> >>to host it.
> >>
> >>Oh, there is 15,000 entries in my hosts file.  O_O
> >>
> >>Could someone tell me how this is done?  May even learn something here.
> >>If I can do this, I'm sure I will.
> >>
> >>Thanks.
> >>
> >>Dale
> >>
> >>:-)  :-)
> >
> >'uniq' and 'sort' should do what you're after, check out the man pages.
>
> Thanks, read the man page, it was short so it didn't take long.  I tried
> this:
>
> uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort
>
> It doesn't look like it did anything but copy the same thing over.
> There are only 2 lines missing.  Does spaces count?  Some put in a lot
> of spaces between the localhost and the web address.  Maybe that has a
> affect??
>
> Thanks for the help.  I had never seen that command before.  I had heard
> of sort, never used it though.  I do have those on my desktop.  I'm
> playing with copies instead of my real hosts file.
>
> Thanks again.
>
> Dale
>
> :-)  :-)

Yes the spaces matter, you could possibly use 'tr' to turn all repeated spaces 
into a single space.

$ tr -s ' ' < filename

That should do it, then you can pipe it through uniq and sort and do whatever 
else you want with it.

-- 
Raymond Lewis Rebbeck
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gentoo-user] [OT]  Question about duplicate lines in file
  2006-06-12 17:19   ` Teresa and Dale
  2006-06-12 17:32     ` Matthew Cline
  2006-06-12 17:37     ` Raymond Lewis Rebbeck
@ 2006-06-12 17:37     ` Neil Bothwick
  2006-06-12 17:45     ` Mike Williams
  2006-06-12 17:55     ` [gentoo-user] " Christer Ekholm
  4 siblings, 0 replies; 12+ messages in thread
From: Neil Bothwick @ 2006-06-12 17:37 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]

On Mon, 12 Jun 2006 12:19:46 -0500, Teresa and Dale wrote:

> uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort

uniq only removes consecutive duplicate line, you need to use sort first

sort file | uniq >newfile

or, possibly, depending on the format of your file

sort -u file >newfile


-- 
Neil Bothwick

Few women admit their age. Few men act theirs.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gentoo-user] [OT]  Question about duplicate lines in file
  2006-06-12 17:19   ` Teresa and Dale
                       ` (2 preceding siblings ...)
  2006-06-12 17:37     ` Neil Bothwick
@ 2006-06-12 17:45     ` Mike Williams
  2006-06-12 17:55     ` [gentoo-user] " Christer Ekholm
  4 siblings, 0 replies; 12+ messages in thread
From: Mike Williams @ 2006-06-12 17:45 UTC (permalink / raw
  To: gentoo-user

On Monday 12 June 2006 18:19, Teresa and Dale wrote:
> Thanks, read the man page, it was short so it didn't take long.  I tried
> this:

sort would be more appropriate. I don't believe uniq will find matches 
anywhere in the file, i.e.

192
195
192

wouldn't get shortened, but

192
192
195

would.

-- 
Mike Williams

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [gentoo-user]  Re: [OT]  Question about duplicate lines in file
  2006-06-12 17:19   ` Teresa and Dale
                       ` (3 preceding siblings ...)
  2006-06-12 17:45     ` Mike Williams
@ 2006-06-12 17:55     ` Christer Ekholm
  2006-06-12 18:39       ` Alan McKinnon
  4 siblings, 1 reply; 12+ messages in thread
From: Christer Ekholm @ 2006-06-12 17:55 UTC (permalink / raw
  To: gentoo-user; +Cc: Teresa and Dale

Teresa and Dale <teendale@vista-express.com> writes:

>
>
> Thanks, read the man page, it was short so it didn't take long.  I tried
> this:
>
> uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort
>
> It doesn't look like it did anything but copy the same thing over. 
> There are only 2 lines missing.  Does spaces count?  Some put in a lot
> of spaces between the localhost and the web address.  Maybe that has a
> affect??

The problem with uniq is that it (according to the manpage),

  "Discard all but one of successive identical lines"

You need to have a sorted file for uniq to do what you want, or sort
it with the -u  option

  sort -u hosts > hostsort

If you don't want to ruin your original order you have to do something
else. This is one way of doing it with perl.

  perl -ne 'print unless exists $h{$_}; $h{$_} = 1' hosts > hostsort

--
 Christer

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gentoo-user]  Re: [OT]  Question about duplicate lines in file
  2006-06-12 17:55     ` [gentoo-user] " Christer Ekholm
@ 2006-06-12 18:39       ` Alan McKinnon
  2006-06-12 19:15         ` Neil Bothwick
  0 siblings, 1 reply; 12+ messages in thread
From: Alan McKinnon @ 2006-06-12 18:39 UTC (permalink / raw
  To: gentoo-user

On Monday 12 June 2006 19:55, Christer Ekholm wrote:
> Teresa and Dale <teendale@vista-express.com> writes:
> > Thanks, read the man page, it was short so it didn't take long. 
> > I tried this:
> >
> > uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort
> >
> > It doesn't look like it did anything but copy the same thing
> > over. There are only 2 lines missing.  Does spaces count?  Some
> > put in a lot of spaces between the localhost and the web address.
> >  Maybe that has a affect??
>
> The problem with uniq is that it (according to the manpage),
>
>   "Discard all but one of successive identical lines"
>
> You need to have a sorted file for uniq to do what you want, or
> sort it with the -u  option
>
>   sort -u hosts > hostsort
>
> If you don't want to ruin your original order you have to do
> something else. This is one way of doing it with perl.
>
>   perl -ne 'print unless exists $h{$_}; $h{$_} = 1' hosts >
> hostsort


Almost there :-)

If /etc/hosts has these lines:
127.0.0.1 localhost
127.0.0.1  localhost
uniq will see these as different even though they are actually the 
same entry. So he needs something like tr to squash spaces. This will 
do it (as root):

cat /etc/hosts | tr -s ' ' | sort | uniq -i > /etc/hosts.new

If the new file is OK, use it to overwrite /etc/hosts

Explanation so Dale knows what I'm asking him to do:
cat send the file to tr
tr finds all cases of two or more consecutive spaces and replaces them 
with one space
sort does a sort
uniq finds consecutive lines that are the same and throws away the 
extra ones. The -i is there just in case two entries differ in case 
only (as FQDNs are strictly speaking case insensitive). As mentioned 
by others, uniq only matches consecutive dupes, so the list must be 
sorted first
> /etc/hosts.new writes the final output to the named disk file

Cheers,
alan

p.s. Those 15,000 entries in your hosts file are, um, a lot :-)


-- 
If only me, you and dead people understand hex, 
how many people understand hex?

Alan McKinnon
alan at linuxholdings dot co dot za
+27 82, double three seven, one nine three five
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gentoo-user]  Re: [OT]  Question about duplicate lines in file
  2006-06-12 18:39       ` Alan McKinnon
@ 2006-06-12 19:15         ` Neil Bothwick
  2006-06-12 22:52           ` Teresa and Dale
  0 siblings, 1 reply; 12+ messages in thread
From: Neil Bothwick @ 2006-06-12 19:15 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 643 bytes --]

On Mon, 12 Jun 2006 20:39:20 +0200, Alan McKinnon wrote:

> If /etc/hosts has these lines:
> 127.0.0.1 localhost
> 127.0.0.1  localhost
> uniq will see these as different even though they are actually the 
> same entry. So he needs something like tr to squash spaces. This will 
> do it (as root):
> 
> cat /etc/hosts | tr -s ' ' | sort | uniq -i > /etc/hosts.new

sort -u -k1,1 /etc/hosts >/etc/hosts.new

avoids the need to use cat, uniq or tr. -k1,1 sorts on the first field
(space delimited) and -u remove lines where the sort field is the same.


-- 
Neil Bothwick

Please rotate your phone 90 degrees and try again.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gentoo-user]  Re: [OT]  Question about duplicate lines in file
  2006-06-12 19:15         ` Neil Bothwick
@ 2006-06-12 22:52           ` Teresa and Dale
  2006-06-12 23:23             ` Neil Bothwick
  0 siblings, 1 reply; 12+ messages in thread
From: Teresa and Dale @ 2006-06-12 22:52 UTC (permalink / raw
  To: gentoo-user

Neil Bothwick wrote:

>On Mon, 12 Jun 2006 20:39:20 +0200, Alan McKinnon wrote:
>
>  
>
>>If /etc/hosts has these lines:
>>127.0.0.1 localhost
>>127.0.0.1  localhost
>>uniq will see these as different even though they are actually the 
>>same entry. So he needs something like tr to squash spaces. This will 
>>do it (as root):
>>
>>cat /etc/hosts | tr -s ' ' | sort | uniq -i > /etc/hosts.new
>>    
>>
>
>sort -u -k1,1 /etc/hosts >/etc/hosts.new
>
>avoids the need to use cat, uniq or tr. -k1,1 sorts on the first field
>(space delimited) and -u remove lines where the sort field is the same.
>
>
>  
>

Well that removed a few, all of them to be exact.  The file was blank. 
O_O  LOL  I'm learning though.

Dale
:-)
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [gentoo-user]  Re: [OT]  Question about duplicate lines in file
  2006-06-12 22:52           ` Teresa and Dale
@ 2006-06-12 23:23             ` Neil Bothwick
  0 siblings, 0 replies; 12+ messages in thread
From: Neil Bothwick @ 2006-06-12 23:23 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 639 bytes --]

On Mon, 12 Jun 2006 17:52:21 -0500, Teresa and Dale wrote:

> >sort -u -k1,1 /etc/hosts >/etc/hosts.new
> >
> >avoids the need to use cat, uniq or tr. -k1,1 sorts on the first field
> >(space delimited) and -u remove lines where the sort field is the same.

> Well that removed a few, all of them to be exact.  The file was blank. 
> O_O  LOL  I'm learning though.

What's the format of the file? If it's a standard /etc/hosts layout, this
will removed duplicates based on IP address, but if you have another
field first, you need to change the key.


-- 
Neil Bothwick

"Bother," said Pooh, as he connected at 300 bps.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-06-12 23:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-12 16:42 [gentoo-user] [OT] Question about duplicate lines in file Teresa and Dale
2006-06-12 16:54 ` Raymond Lewis Rebbeck
2006-06-12 17:19   ` Teresa and Dale
2006-06-12 17:32     ` Matthew Cline
2006-06-12 17:37     ` Raymond Lewis Rebbeck
2006-06-12 17:37     ` Neil Bothwick
2006-06-12 17:45     ` Mike Williams
2006-06-12 17:55     ` [gentoo-user] " Christer Ekholm
2006-06-12 18:39       ` Alan McKinnon
2006-06-12 19:15         ` Neil Bothwick
2006-06-12 22:52           ` Teresa and Dale
2006-06-12 23:23             ` Neil Bothwick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox