* [gentoo-user] Document management solution [possibly a bit off-topic...]
@ 2005-09-29 14:45 Steve [Gentoo]
2005-09-29 17:29 ` A. Khattri
0 siblings, 1 reply; 6+ messages in thread
From: Steve [Gentoo] @ 2005-09-29 14:45 UTC (permalink / raw
To: gentoo-user
I think I want a "document management solution" - though I'm not sure
that everyone understands the same idea by the term.
I've got a filing cabinet full of paperwork which is an absolute
nightmare to cope with. One of the key problems is that the documents
want to be indexed in different ways. All the documents are dated, but
they can be further sub-divided by subjects - lots of documents
appertain to several subjects. I frequently require to find either a
specific document, a sequence of related documents or similar. I rarely
need the original document - but often want a copy or just to check
some detail or other. Some documents are multi-page, some single
page... all can be easily scanned.
I'm interested to establish software which minimises the burden of
managing these documents - probably as scanned images. I'm familiar
with the Dj-Vu Libre library and think that format is fantastic - though
a less ambitious format would likely suffice (even at 200dpi grey scale
jpegs I get ~10,000 pages without needing more than one DVD to back
up...) A significant burden is in scanning and storing all these
documents - and this makes a good UI essential - preferably allowing a
single click to scan a document (incidentally can anyone recommend a
good, cheap, sheet-fed scanner?) before page-preview (cropping/rotating)
and assignment of "subject" classification and date-stamping. It would
be useful if there was an OCR pass in order to extract plain-text and to
index that - though this feature is not essential. There would need to
be a friendly UI in order to establish all the documents matching a
given subject classification (or group of classifications) - to preview
on-screen and offer an option to print... preferably in-order... maybe
with a watermark dating the copy?
Is anyone aware of any existing packages - preferably for Gentoo, but
any open-source solution would suffice.
Thanks in advance for any suggestions :-)
Steve
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Document management solution [possibly a bit off-topic...]
2005-09-29 14:45 [gentoo-user] Document management solution [possibly a bit off-topic...] Steve [Gentoo]
@ 2005-09-29 17:29 ` A. Khattri
2005-09-29 17:53 ` Steve [Gentoo]
0 siblings, 1 reply; 6+ messages in thread
From: A. Khattri @ 2005-09-29 17:29 UTC (permalink / raw
To: gentoo-user
On Thu, 29 Sep 2005, Steve [Gentoo] wrote:
> I think I want a "document management solution" - though I'm not sure
> that everyone understands the same idea by the term.
This might be overkill:
http://www.alfresco.org/
Or maybe something like ScrollKeeper would suffice?
--
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Document management solution [possibly a bit off-topic...]
2005-09-29 17:29 ` A. Khattri
@ 2005-09-29 17:53 ` Steve [Gentoo]
2005-09-29 20:52 ` A. Khattri
0 siblings, 1 reply; 6+ messages in thread
From: Steve [Gentoo] @ 2005-09-29 17:53 UTC (permalink / raw
To: gentoo-user
A. Khattri wrote:
> On Thu, 29 Sep 2005, Steve [Gentoo] wrote:
>> I think I want a "document management solution" - though I'm not sure
>> that everyone understands the same idea by the term.
>>
> This might be overkill:
> http://www.alfresco.org/
>
Alfresco is what I'd have called a content management system - as
opposed to a document management system. I'm interested in managing
archives of documents I have received from other people (in dead-tree
format)...
> Or maybe something like ScrollKeeper would suffice?
Scrollkeeper seems to target electronic manuals etc. (as far as I can
tell) - It doesn't appear to be focused on scanned documents. The
typical sort of documents I need to manage include monthly and quarterly
invoices and statements etc. from a wide variety of vendors.
Like Alfresco, I'd say that Scrollkeeper looks more like a content
management system than a document management system...
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Document management solution [possibly a bit off-topic...]
2005-09-29 17:53 ` Steve [Gentoo]
@ 2005-09-29 20:52 ` A. Khattri
2005-09-29 22:36 ` Nick Rout
0 siblings, 1 reply; 6+ messages in thread
From: A. Khattri @ 2005-09-29 20:52 UTC (permalink / raw
To: gentoo-user
On Thu, 29 Sep 2005, Steve [Gentoo] wrote:
> Alfresco is what I'd have called a content management system - as
> opposed to a document management system. I'm interested in managing
> archives of documents I have received from other people (in dead-tree
> format)...
If there was something that scanned the document, performed OCR on it,
checked the OCR output and then built an electronic repository for you I'd
recommend it. Until then, Alfresco is the closest thing Ive seen that is
open source. If you're willing to do your own scanning and OCR'ing then it
will do the rest.
BTW, I would call things like Mambo or Xaraya, content-management tools -
Alfresco is a slightly different kettle of fish.
--
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Document management solution [possibly a bit off-topic...]
2005-09-29 20:52 ` A. Khattri
@ 2005-09-29 22:36 ` Nick Rout
2005-09-30 1:43 ` Eric Crossman
0 siblings, 1 reply; 6+ messages in thread
From: Nick Rout @ 2005-09-29 22:36 UTC (permalink / raw
To: gentoo-user
On Thu, 29 Sep 2005 16:52:54 -0400 (EDT)
A. Khattri wrote:
> On Thu, 29 Sep 2005, Steve [Gentoo] wrote:
>
> > Alfresco is what I'd have called a content management system - as
> > opposed to a document management system. I'm interested in managing
> > archives of documents I have received from other people (in dead-tree
> > format)...
>
> If there was something that scanned the document, performed OCR on it,
> checked the OCR output and then built an electronic repository for you I'd
> recommend it. Until then, Alfresco is the closest thing Ive seen that is
> open source. If you're willing to do your own scanning and OCR'ing then it
> will do the rest.
>
> BTW, I would call things like Mambo or Xaraya, content-management tools -
> Alfresco is a slightly different kettle of fish.
Yes I know what Steve is after, and I'd love to find a way. I was put
off by Alfresco being called "Content Management" because all of the
content management systems I have seen end up bioding something that
resembles [name your favourite news website]
A closer look at alfresco reveals that it does look more like what Steve (and I ) are after.
I am a lawyer and I handle hundreds of documents every week, from email
through pdf (both made from an electronic source and therefore has all
the text available, and scanned) openoffice (one enlightened client!),
word, excel, html, faxes, letters (on paper, ya know!) you name it
someone will send me something in it!
It'd be great to have a metadata system where I could give everything
some keywords:
client name, file number, matter number, subjects, useful as a
precedent, useful case etc etc etc so that in future I can :
pull up every document on my computer, my secretary's computer, my mail
server (including attachments), my file server, my palm pilot, relating
to a particular client
pull up every document about company debentures
find the case i downloaded and stored somewhere about liability of
guarantors in a consumer credit loan
find the seminar book for the seminar i went to on asome new area of
law.
find a letter written by Joe Bloggs sometime in 2003.
>
>
> --
>
> --
> gentoo-user@gentoo.org mailing list
--
Nick Rout <nick@rout.co.nz>
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-user] Document management solution [possibly a bit off-topic...]
2005-09-29 22:36 ` Nick Rout
@ 2005-09-30 1:43 ` Eric Crossman
0 siblings, 0 replies; 6+ messages in thread
From: Eric Crossman @ 2005-09-30 1:43 UTC (permalink / raw
To: gentoo-user
On Fri, 2005-09-30 at 10:36 +1200, Nick Rout wrote:
> On Thu, 29 Sep 2005 16:52:54 -0400 (EDT)
> A. Khattri wrote:
>
> > On Thu, 29 Sep 2005, Steve [Gentoo] wrote:
> >
> > > Alfresco is what I'd have called a content management system - as
> > > opposed to a document management system. I'm interested in managing
> > > archives of documents I have received from other people (in dead-tree
> > > format)...
> >
> > If there was something that scanned the document, performed OCR on it,
> > checked the OCR output and then built an electronic repository for you I'd
> > recommend it. Until then, Alfresco is the closest thing Ive seen that is
> > open source. If you're willing to do your own scanning and OCR'ing then it
> > will do the rest.
> >
> > BTW, I would call things like Mambo or Xaraya, content-management tools -
> > Alfresco is a slightly different kettle of fish.
>
> Yes I know what Steve is after, and I'd love to find a way. I was put
> off by Alfresco being called "Content Management" because all of the
> content management systems I have seen end up bioding something that
> resembles [name your favourite news website]
>
> A closer look at alfresco reveals that it does look more like what Steve (and I ) are after.
>
> I am a lawyer and I handle hundreds of documents every week, from email
> through pdf (both made from an electronic source and therefore has all
> the text available, and scanned) openoffice (one enlightened client!),
> word, excel, html, faxes, letters (on paper, ya know!) you name it
> someone will send me something in it!
>
> It'd be great to have a metadata system where I could give everything
> some keywords:
>
> client name, file number, matter number, subjects, useful as a
> precedent, useful case etc etc etc so that in future I can :
>
> pull up every document on my computer, my secretary's computer, my mail
> server (including attachments), my file server, my palm pilot, relating
> to a particular client
>
> pull up every document about company debentures
>
> find the case i downloaded and stored somewhere about liability of
> guarantors in a consumer credit loan
>
> find the seminar book for the seminar i went to on asome new area of
> law.
>
> find a letter written by Joe Bloggs sometime in 2003.
>
>
> >
> >
> > --
> >
> > --
> > gentoo-user@gentoo.org mailing list
>
> --
> Nick Rout <nick@rout.co.nz>
>
I'm not sure if what you're describing exists right now in the open
source world, but I can tell you that it certainly does in the
commercial world. I used to work in the "metadata" department for a
startup here in upstate NY, USA that built a web based application
targeting lawyers such as yourself. It was written in PHP/MySQL but the
database was being migrated to Oracle due to the rapid growth in the
database tables.
Unfortunately though, in the migration to Oracle, they elected to create
a "dynamic" scheme to support adding custom metadata fields as requested
per client. It was great for flexibility but the performance was
horrible even on quad 3 ghz xeon boxes with maxed out memory. For us
programmers, it also made the easy queries difficult and the hard
queries near impossible.
Eric
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-09-30 1:47 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-29 14:45 [gentoo-user] Document management solution [possibly a bit off-topic...] Steve [Gentoo]
2005-09-29 17:29 ` A. Khattri
2005-09-29 17:53 ` Steve [Gentoo]
2005-09-29 20:52 ` A. Khattri
2005-09-29 22:36 ` Nick Rout
2005-09-30 1:43 ` Eric Crossman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox