From: "leon j. breedt" <ljb@neverborn.org>
To: gentoo-dev@gentoo.org
Subject: [gentoo-dev] orphaned files on system?
Date: Thu, 17 Apr 2003 23:41:12 +1200 [thread overview]
Message-ID: <20030417114112.GA27271@jeddah.neverborn.ORG> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 1185 bytes --]
hi,
i use the attached script to scan for unpackaged files on my filesystem,
and found quite a few in /etc, /usr/lib, /usr/X11R6 as well as the
expected places. most of them were symlinks, the intention being fairly
obvious (like the NVIDIA OpenGL stuff).
but i was hoping someone could explain why files like /etc/make.conf, /etc/csh.env,
/etc/env.d/05gcc and /usr/include/awk/acconfig.h didn't belong to any package.
i run the script with:
$ ./gtfilelint -v -C gtfilelint.conf -o orphans.list
use -h to see available params. multiple -v increases verbosity.
if you have a system with lots of packages, its going to take some time,
as it caches all the /var/db/pkg/**/CONTENTS entries in a Berkeley hashdb
for quick lookups, then runs /usr/bin/find on /, and compares results. exclusions
to the find output are made by adding python re module regexes to
gtfilelint.conf.
if you run it as user, you may get some error output from find about permissions.
you will want to specify a config file, otherwise you'll get a lot of stuff
you probably don't care about.
hope someone finds this useful
leon
--
in the beginning, was the code.
[-- Attachment #1.2: gtfilelint --]
[-- Type: text/plain, Size: 5980 bytes --]
#!/usr/bin/env python
#
# Finds files on Gentoo Linux systems that do not belong
# to any installed package.
#
# Released under the GNU GPL.
#
# (C) Copyright 2003 Leon J. Breedt
#
# $Id$
import dbhash
import getopt
import os
import os.path
import re
import string
import sys
TRUE = 1
FALSE = 0
version = '0.1.1'
configfile = '/etc/gtfilelint.conf'
dbdir = '/var/db/pkg'
cachefile = '/tmp/gtfilelint.db'
outputfile = None
warnmissing = FALSE
findcmd = 'find / -print'
exclusions = []
verbosity = 0
cachedb = None
def verb(msg, level=1):
if verbosity >= level:
sys.stderr.write('-- %s\n' % msg)
def vverb(msg):
verb(msg, 2)
def info(msg):
sys.stderr.write('>> %s\n' % msg)
def error(msg):
sys.stderr.write('error: %s\n' % msg)
sys.exit(1)
def warn(msg):
sys.stderr.write('warning: %s\n' % msg)
def usage():
print 'usage: %s [options]' % sys.argv[0]
print 'options:'
print '-h|--help display this message'
print '-V|--version print program version and exit'
print '-v|--verbose print verbose messages about what is being done'
print '-d|--dbdir directory containing package database (default: %s)' % dbdir
print '-c|--cachefile file to place temporary cache in (default: %s)' % cachefile
print '-C|--configfile configuration file (default: %s)' % configfile
print '-o|--outputfile file to print orphan list to (default: stdout)'
print '--warnmissing warn if files declared in CONTENTS don\'t exist'
def parse_cmdline():
global verbosity, dbdir, cachefile, configfile, outputfile, warnmissing
opts, args = getopt.getopt(sys.argv[1:], "hVvd:c:C:o:", ["help", "version", "verbose", "dbdir=", "cachefile=", "configfile=", "outputfile=", "warnmissing"])
for opt, arg in opts:
if opt in ("-h", "--help"):
usage()
sys.exit(0)
if opt in ("-V", "--version"):
print version
sys.exit(0)
if opt in ("-v", "--verbose"):
verbosity = verbosity + 1
if opt in ("-d", "--dbdir"):
dbdir = arg
if opt in ("-c", "--cachefile"):
cachefile = arg
if opt in ("-C", "--configfile"):
configfile = arg
if opt in ("-o", "--outputfile"):
outputfile = arg
if opt == "--warnmissing":
warnmissing = TRUE
def parse_config():
if not os.path.exists(configfile) or not os.access(configfile, os.R_OK):
warn('missing configfile "%s"' % configfile)
return
fp = open(configfile, 'r')
for line in fp.readlines():
line = string.strip(line)
if len(line) == 0:
continue
if line[0] == '#':
continue
exclusions.append(re.compile(line))
verb('adding "%s" to list of exclusion regular expressions' % line)
fp.close()
def cache_package_files(package, packagepath):
verb('caching contents of "%s"' % package)
fp = open(packagepath + '/CONTENTS')
lineno = 0
for line in fp.readlines():
lineno = lineno + 1
line = string.strip(line)
if len(line) == 0:
continue
key = None
m = re.match(r"^dir (\S.*)$", line)
if m:
key = m.group(1)
m = None
else:
m = re.match(r"^obj (\S.*) (\S+) (\d+)\s*$", line)
if m:
key = m.group(1)
m = None
m = re.match(r"^sym (\S.*) -> .*$", line)
if m:
key = m.group(1)
if key != None:
if not os.path.exists(key) and warnmissing:
warn('%s: "%s" does not exist on filesystem, ignoring' % (package, key))
vverb('caching "%s"' % key)
cachedb[key] = ''
else:
vverb('key is None for "%s" CONTENTS line %d' % (package, lineno))
fp.close()
def scan_group_packages(group, grouppath):
packages = os.listdir(grouppath)
packages.sort()
verb('found %d packages in group "%s"' % (len(packages), group))
for package in packages:
packagepath = grouppath + '/' + package
cache_package_files(package, packagepath)
def create_system_filelist():
info('scanning all files on system')
sout = os.popen(findcmd, 'r')
verb('reading paths from "%s"' % findcmd)
paths = sout.readlines()
orphans = 0
rptfp = None
for path in paths:
path = string.strip(path)
if len(path) == 0:
continue
if path[0] != '/':
warn('ignoring relative path "%s"' % path)
continue
matched = FALSE
for exre in exclusions:
if exre.match(path):
matched = TRUE
break
if matched:
vverb('"%s" matched exclusion regex, ignoring' % path)
continue
if not cachedb.has_key(path):
if orphans == 0:
if outputfile:
info('writing orphaned file list [%s]' % outputfile)
rptfp = open(outputfile, 'w+')
else:
info('orphaned files:')
rptfp = sys.stdout
rptfp.flush()
orphans = orphans + 1
rptfp.write('%s\n' % path)
rptfp.flush()
if rptfp:
rptfp.close()
sout.close()
if orphans > 0:
info('%d orphaned file(s) found' % orphans)
else:
info('no orphaned files on system')
# Main
try:
parse_cmdline()
parse_config()
info('creating packaged files cache [%s]' % cachefile)
cachedb = dbhash.open(cachefile, 'n')
try:
groups = os.listdir(dbdir)
groups.sort()
for group in groups:
grouppath = dbdir + '/' + group
scan_group_packages(group, grouppath)
create_system_filelist()
finally:
if cachedb:
cachedb.close()
os.unlink(cachefile)
except KeyboardInterrupt: 1
except:
raise
[-- Attachment #1.3: gtfilelint.conf --]
[-- Type: text/plain, Size: 1099 bytes --]
# we don't really care about these dynamic paths
^/var/log/.*
^/var/db/.*
^/var/spool/.*
^/var/tmp/.*
^/var/lib/.*
^/var/cache/.*
^/var/run/.*
# / is not owned by any package
^/$
# don't care about root's config files
^/root/.*
# /usr/local is typically just user compiled stuff,
# don't care about it -- this list from baselayout
^/usr/local/bin/.*
^/usr/local/doc$
^/usr/local/lib/.*
^/usr/local/man$
^/usr/local/src/.*
^/usr/local/sbin/.*
^/usr/local/games/.*
^/usr/local/share/doc/.*
^/usr/local/share/man/.*
^/usr/local/share/.*
# what is /lib/dev-state? dunno...but the dir is in
# baselayout, even if the files arent
^/lib/dev-state/.*
# devices aren't that important to us...the packaged
# files will not be visible anyway due to devfs
^/dev/.*
# anyone packaging anything into /tmp should be shot
^/tmp/.*
# portage tree we don't care about either
^/usr/portage$
^/usr/portage/.*
# mountpoints shouldn't have package files installed in them
^/mnt/.*
# system filesystems should be ignored
^/proc/.*
^/sys/.*
# USER CUSTOMIZATIONS
^/data
^/data/.*
^/cdrom.*
^/windata
^/windata/.*
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
next reply other threads:[~2003-04-16 23:41 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-04-17 11:41 leon j. breedt [this message]
2003-04-20 14:46 ` [gentoo-dev] orphaned files on system? Daniel Armyr
2003-04-21 7:44 ` Evan Powers
2003-04-21 20:10 ` leon j. breedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030417114112.GA27271@jeddah.neverborn.ORG \
--to=ljb@neverborn.org \
--cc=gentoo-dev@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox