public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: "leon j. breedt" <ljb@neverborn.org>
To: gentoo-dev@gentoo.org
Subject: [gentoo-dev] orphaned files on system?
Date: Thu, 17 Apr 2003 23:41:12 +1200	[thread overview]
Message-ID: <20030417114112.GA27271@jeddah.neverborn.ORG> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 1185 bytes --]

hi,

i use the attached script to scan for unpackaged files on my filesystem,
and found quite a few in /etc, /usr/lib, /usr/X11R6 as well as the
expected places. most of them were symlinks, the intention being fairly
obvious (like the NVIDIA OpenGL stuff).

but i was hoping someone could explain why files like /etc/make.conf, /etc/csh.env,
/etc/env.d/05gcc and /usr/include/awk/acconfig.h didn't belong to any package.

i run the script with:

$ ./gtfilelint -v -C gtfilelint.conf -o orphans.list

use -h to see available params. multiple -v increases verbosity.

if you have a system with lots of packages, its going to take some time,
as it caches all the /var/db/pkg/**/CONTENTS entries in a Berkeley hashdb
for quick lookups, then runs /usr/bin/find on /, and compares results. exclusions
to the find output are made by adding python re module regexes to
gtfilelint.conf.

if you run it as user, you may get some error output from find about permissions.
you will want to specify a config file, otherwise you'll get a lot of stuff
you probably don't care about.
        
hope someone finds this useful

leon

-- 
in the beginning, was the code.


[-- Attachment #1.2: gtfilelint --]
[-- Type: text/plain, Size: 5980 bytes --]

#!/usr/bin/env python
#
# Finds files on Gentoo Linux systems that do not belong
# to any installed package.
#
# Released under the GNU GPL.
#
# (C) Copyright 2003 Leon J. Breedt
#
# $Id$

import dbhash
import getopt
import os
import os.path
import re
import string
import sys

TRUE = 1
FALSE = 0
version = '0.1.1'
configfile = '/etc/gtfilelint.conf'
dbdir = '/var/db/pkg'
cachefile = '/tmp/gtfilelint.db'
outputfile = None
warnmissing = FALSE
findcmd = 'find / -print'
exclusions = []
verbosity = 0
cachedb = None

def verb(msg, level=1):
    if verbosity >= level:
        sys.stderr.write('-- %s\n' % msg)

def vverb(msg):
    verb(msg, 2)

def info(msg):
    sys.stderr.write('>> %s\n' % msg)

def error(msg):
    sys.stderr.write('error: %s\n' % msg)
    sys.exit(1)

def warn(msg):
    sys.stderr.write('warning: %s\n' % msg)

def usage():
    print 'usage: %s [options]' % sys.argv[0]
    print 'options:'
    print '-h|--help       display this message'
    print '-V|--version    print program version and exit'
    print '-v|--verbose    print verbose messages about what is being done'
    print '-d|--dbdir      directory containing package database (default: %s)' % dbdir
    print '-c|--cachefile  file to place temporary cache in (default: %s)' % cachefile
    print '-C|--configfile configuration file (default: %s)' % configfile
    print '-o|--outputfile file to print orphan list to (default: stdout)'
    print '--warnmissing   warn if files declared in CONTENTS don\'t exist'

def parse_cmdline():
    global verbosity, dbdir, cachefile, configfile, outputfile, warnmissing
    opts, args = getopt.getopt(sys.argv[1:], "hVvd:c:C:o:", ["help", "version", "verbose", "dbdir=", "cachefile=", "configfile=", "outputfile=", "warnmissing"])
    for opt, arg in opts:
        if opt in ("-h", "--help"):
            usage()
            sys.exit(0)
        if opt in ("-V", "--version"):
            print version
            sys.exit(0)
        if opt in ("-v", "--verbose"):
            verbosity = verbosity + 1
        if opt in ("-d", "--dbdir"):
            dbdir = arg
        if opt in ("-c", "--cachefile"):
            cachefile = arg
        if opt in ("-C", "--configfile"):
            configfile = arg
        if opt in ("-o", "--outputfile"):
            outputfile = arg
        if opt == "--warnmissing":
            warnmissing = TRUE

def parse_config():
    if not os.path.exists(configfile) or not os.access(configfile, os.R_OK):
        warn('missing configfile "%s"' % configfile)
        return
    fp = open(configfile, 'r')
    for line in fp.readlines():
        line = string.strip(line)
        if len(line) == 0:
            continue
        if line[0] == '#':
            continue
        exclusions.append(re.compile(line))
        verb('adding "%s" to list of exclusion regular expressions' % line)
    fp.close()

def cache_package_files(package, packagepath):
    verb('caching contents of "%s"' % package)
    fp = open(packagepath + '/CONTENTS')
    lineno = 0
    for line in fp.readlines():
        lineno = lineno + 1
        line = string.strip(line)
        if len(line) == 0:
            continue
        key = None
        m = re.match(r"^dir (\S.*)$", line)
        if m:
            key = m.group(1)
            m = None
        else:
            m = re.match(r"^obj (\S.*) (\S+) (\d+)\s*$", line)
        if m:
            key = m.group(1)
            m = None
        m = re.match(r"^sym (\S.*) -> .*$", line)
        if m:
            key = m.group(1)
        if key != None:
            if not os.path.exists(key) and warnmissing:
                warn('%s: "%s" does not exist on filesystem, ignoring' % (package, key))
            vverb('caching "%s"' % key)
            cachedb[key] = ''
        else:
            vverb('key is None for "%s" CONTENTS line %d' % (package, lineno))
    fp.close()

def scan_group_packages(group, grouppath):
    packages = os.listdir(grouppath)
    packages.sort()
    verb('found %d packages in group "%s"' % (len(packages), group))
    for package in packages:
        packagepath = grouppath + '/' + package
        cache_package_files(package, packagepath)

def create_system_filelist():
    info('scanning all files on system')
    sout = os.popen(findcmd, 'r')
    verb('reading paths from "%s"' % findcmd)
    paths = sout.readlines()
    orphans = 0
    rptfp = None
    for path in paths:
        path = string.strip(path)
        if len(path) == 0:
            continue
        if path[0] != '/':
            warn('ignoring relative path "%s"' % path)
            continue
        matched = FALSE
        for exre in exclusions:
            if exre.match(path):
                matched = TRUE
                break
        if matched:
            vverb('"%s" matched exclusion regex, ignoring' % path)
            continue
        if not cachedb.has_key(path):
            if orphans == 0:
                if outputfile:
                    info('writing orphaned file list [%s]' % outputfile)
                    rptfp = open(outputfile, 'w+')
                else:
                    info('orphaned files:')
                    rptfp = sys.stdout
                rptfp.flush()
            orphans = orphans + 1
            rptfp.write('%s\n' % path)
            rptfp.flush()
    if rptfp:
        rptfp.close()
    sout.close()
    if orphans > 0:
        info('%d orphaned file(s) found' % orphans)
    else:
        info('no orphaned files on system')

# Main
try:
    parse_cmdline()
    parse_config()
    info('creating packaged files cache [%s]' % cachefile)
    cachedb = dbhash.open(cachefile, 'n')
    try:
        groups = os.listdir(dbdir)
        groups.sort()
        for group in groups:
            grouppath = dbdir + '/' + group
            scan_group_packages(group, grouppath)
        create_system_filelist()
    finally:
        if cachedb:
            cachedb.close()
            os.unlink(cachefile)
except KeyboardInterrupt: 1
except:
    raise

[-- Attachment #1.3: gtfilelint.conf --]
[-- Type: text/plain, Size: 1099 bytes --]

# we don't really care about these dynamic paths
^/var/log/.*
^/var/db/.*
^/var/spool/.*
^/var/tmp/.*
^/var/lib/.*
^/var/cache/.*
^/var/run/.*

# / is not owned by any package
^/$

# don't care about root's config files
^/root/.*

# /usr/local is typically just user compiled stuff,
# don't care about it -- this list from baselayout
^/usr/local/bin/.*
^/usr/local/doc$
^/usr/local/lib/.*
^/usr/local/man$
^/usr/local/src/.*
^/usr/local/sbin/.*
^/usr/local/games/.*
^/usr/local/share/doc/.*
^/usr/local/share/man/.*
^/usr/local/share/.*

# what is /lib/dev-state? dunno...but the dir is in
# baselayout, even if the files arent
^/lib/dev-state/.*

# devices aren't that important to us...the packaged
# files will not be visible anyway due to devfs
^/dev/.*

# anyone packaging anything into /tmp should be shot
^/tmp/.*

# portage tree we don't care about either
^/usr/portage$
^/usr/portage/.*

# mountpoints shouldn't have package files installed in them
^/mnt/.*

# system filesystems should be ignored
^/proc/.*
^/sys/.*

# USER CUSTOMIZATIONS
^/data
^/data/.*
^/cdrom.*
^/windata
^/windata/.*

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

             reply	other threads:[~2003-04-16 23:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-17 11:41 leon j. breedt [this message]
2003-04-20 14:46 ` [gentoo-dev] orphaned files on system? Daniel Armyr
2003-04-21  7:44 ` Evan Powers
2003-04-21 20:10   ` leon j. breedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030417114112.GA27271@jeddah.neverborn.ORG \
    --to=ljb@neverborn.org \
    --cc=gentoo-dev@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox