PDA

View Full Version : Help on Extracting Textual Data



paul_orman
2004-06-30, 11:46 PM
Hi
I am a consultant preparing a Functional Requirement Specification for a client with a large number of drawing sheets going back as far as the 1960’s stored in a number of Lotus Notes databases. Most of the files are either AutoCad2000 or have been scanned into AutoCad2000 and converted. The client is having difficulty retrieving the data as they did not have a naming convention in place, nor have they much metadata.

I don’t know much about CAD and do not want to propose a solution that is impossible.

What I propose is to write an application to extract each file from the database, open it up and extract all textual information from the Cad file and store that in a separate text document. Another application will then process the text files to find key words and build up metadata. Key words would be sourced from other systems (Asset Register, Site Address systems etc) and be something like Address, Easting and Northing, Foundation, Various acronyms etc.

My questions are:

Is it possible to manipulate (Open, Extract, Close) AutoCad2000 files from an external (non ACAD) application? (ACAD supports VBA, can I just reference the ACAD libraries to access the CAD objects?)

How easy / difficult is it to extract all textual information from an AutoCAD2000 file? I assume text is stored in text blocks, it will be a matter of finding them and extracting the data.

Does a file that has been scanned and converted store textual information that is accessible as text or is the text in vector format? Are there tools/libraries that can access this text?

Thanks

richard.binning
2004-07-01, 02:31 AM
Hi Paul...see inline comments below:


Hi
I am a consultant preparing a Functional Requirement Specification for a client with a large number of drawing sheets going back as far as the 1960’s stored in a number of Lotus Notes databases. Most of the files are either AutoCad2000 or have been scanned into AutoCad2000 and converted. The client is having difficulty retrieving the data as they did not have a naming convention in place, nor have they much metadata.

I don’t know much about CAD and do not want to propose a solution that is impossible.

What I propose is to write an application to extract each file from the database, open it up and extract all textual information from the Cad file and store that in a separate text document. Another application will then process the text files to find key words and build up metadata. Key words would be sourced from other systems (Asset Register, Site Address systems etc) and be something like Address, Easting and Northing, Foundation, Various acronyms etc.

My questions are:

Is it possible to manipulate (Open, Extract, Close) AutoCad2000 files from an external (non ACAD) application?
[RLB] Yes, you will have to have AutoCAD installed on the machine doing the processing. It can run minimized or not visible if you like.

(ACAD supports VBA, can I just reference the ACAD libraries to access the CAD objects?)

[RLB] You will have to reference the ACAD libraries...might I suggest using ObjectDBX (You'll need a session of AutoCAD running to use these libraries, but your processes will run much faster in ObjectDBX rather than native VBA inside of AutoCAD.) Check out this link (http://www.integr-8.com/AU2003/ObjectDBX.htm) for some example code and information. Download the Handout too!

How easy / difficult is it to extract all textual information from an AutoCAD2000 file?

[RLB] You will need to check for text, mtext, attributes, and possibly rtext. The textual data could reside in the chosen file, within a linked file called an external reference or inside of a block or nested collection of blocks inside either of the previously mentioned files.

I assume text is stored in text blocks, it will be a matter of finding them and extracting the data.

Does a file that has been scanned and converted store textual information that is accessible as text or is the text in vector format?

[RLB] Depends on the what you mean by "scanned"...if the plot was scanned in as an image and then brought into AutoCAD you will need to utilize some sort of OCR scanning conversion tool. Otherwise, an image is raster based and would only appear to be text.

Are there tools/libraries that can access this text? Yes...see above.

Thanks