[freearchitecture] An open CAD file format
Bruno Postle
bruno at postle.net
Sat Feb 15 17:00:20 GMT 2003
On Fri 14-Feb-2003 at 05:35:02 +0000, Bruno Postle wrote:
>
> Here are my requirements for a properly open and RCS-able CAD file
> format:
>
> 1. Text only, <cr><lf> delimited, human readable.
> 2. Blocks/references have full inheritance and polymorphism.
> 3. Resources (images, multiline text blocks) exist as separately
> editable files, not embedded in larger files.
> 4. Changes to drawing objects can be made in-place without
> regenerating/reordering the whole dataset.
I was going to do something else this afternoon :-) Anyway, here is
Bruno's Ideal CAD File Format; I really think that something this
hare-brained is needed to bring Computer Aided Design up to speed
with developments in Free Software.
(Apologies if this is a bit technical, it assumes some experience of
CAD as well as file-formats in general)
Features
--------
o A Drawing is a directory full of files.
This is the one radical idea, instead of encapsulating all the
data into a big structured file format like XML, simply use the
file-system as a structured data store.
The idea comes from the Maildir format for storing mail; this has
lots of advantages over the traditional mbox single-file format
(random-access, easy searches, simple deletes, simple appends, no
locking). With normal file-systems, Maildir performs just as
well as a single-file format until there are tens of thousands of
items in one directory - With modern file-systems like ReiserFS,
there are no performance issues with millions of items.
o One file per object.
Every basic object; line, circle, spline, text, reference etc..
is stored in a single file, one object per file. This means that
if your drawing consists of 1000 circles, then there will be
1000+ files in the directory representing that drawing.
o Packaging via zip.
Obviously thousands of files will seem extremely strange to
people who are accustomed to one-file-per-drawing.
For packaging purposes, all the drawing data can be simply
zipped-up; this is the strategy adopted by OpenOffice.org - where
each word-processing file is actually a zip file containing
several other files.
(tar.gz is better for this job, but zip has wider availability)
o Persistent names for all objects.
Each object needs a key/name/filename, this should be unique and
persistent through the lifetime of the object. Something like
this:
<HOSTNAME>-<UNIX_DATE>-<PID>-<INODE>.<OBJECT_TYPE>
Here's an example:
celery-1045317918-1278-225423.circle
Some objects will need to be renamed so they can be remembered
later, this name will then be accessible from within the CAD
program:
enough-room-to-swing-a-cat.circle
o One line per data item.
The entire contents of a file might look something like this:
Centre-X: 345.678
Centre-Y: 9876.543
Centre-Z: 0.0
Radius: 246.8
Transform: 1, 0, 0, 0, 1, 0, 0, 0, 1
Units: millimetres
Each line is <CR><LF> separated so that it can be reliably edited
on various platforms, all data would be UTF-8.
The diff for this after simply changing the radius of the circle
would be human readable and look like this:
diff .celery-1045317918-1278-225423.circle celery-1045317918-1278-225423.circle
4c4
< Radius: 246.8
---
> Radius: 250.0
o Object properties stored in lookup tables.
Geometrical data, like a circle's radius, needs to be stored in
the object itself; other types of property such as: layer, color,
linetype, visibility etc.. need to be stored in lookup tables.
This is one of the features of existing file formats that needs
to be preserved. For example, in AutoCAD, "layer 0" doesn't
indicate that a circle is "in layer zero", it means that the
circle "doesn't have a layer defined".
So there may be a lookup table for "layer" with a name like this:
celery-1045317959-1278-3652.layer
..it might have content like this:
celery-1045317918-1278-225423.circle: walls
celery-1045317921-1278-3565.circle: walls
celery-1045318034-1278-36467.circle: windows
celery-1045318056-1278-466875.circle: windows
enough-room-to-swing-a-cat.circle: construction
Any items not listed in a layer table would simply not have an
assigned layer.
o Blocks, Xrefs, groups and symbols are all the same thing.
As soon as you start treating a drawing as "a directory full of
files", other things start becoming very obvious; for instance, a
block/symbol within a drawing is simply a sub-directory inside
the drawing - This sub-directory is then editable as a drawing in
it's own right.
Of course this needs to be referenced by the parent drawing in
order to be used, so we have another "reference" object type
similar to line, circle etc..
A reference to a block might contain data like this:
Centre-X: 567.890
Centre-Y: 7654.321
Centre-Z: 0.0
Location: celery-1045318078-4358-6447.drawing/
Transform: 1, 0, 0, 0, 1, 0, 0, 0, 1
Units: millimetres
An Xref, which is normally an external embedded drawing, would
have an almost identical format, except with a qualified path:
Location: ../windows/casement-full-height.drawing/
Normal, non-drawing objects can be referenced:
Location: celery-1045317918-1278-225423.circle
Embedded data can be referenced too:
Location: logo.png
Location: ../specification/windows.txt
o Polymorphism via diff/patch files.
Any aspect of an object can be overridden, by referencing an
associated diff file at the same time.
So if you want a circle that is exactly the same as another
circle in every way, except for the radius; then your reference
file would have this line in it:
Patch: celery-1045318234-1278-67457.patch
Advantages
----------
o Easy access to data using standard non-CAD tools.
Since the data is extremely parse-able, difficult things like
database and report generation become easy.
For example, if you need to generate a door schedule from a
drawing set, simply use the grep tool to find all references to
doors - Then search the results to calculate number of
left-handed doors etc..
Mass manipulation of data is also practicable; need to find all
lines that are on layer "walls" and that have Z coordinates
greater than zero? that's a simple Perl one-liner; want to delete
them? that's easy, just `|xargs rm`; want to move them to another
drawing entirely? that's easy too.
o Diff files are clean and descriptive.
Most diff files should be (almost) human readable, inspection of
revised drawings becomes an examination of diff files rather than
hunting around for a "revision-cloud".
CAD software should help by allowing users to visually browse all
the differences.
o RCS-able data, revisions are managed by CVS or Subversion.
By making all data Revision Control System friendly, drawing
management becomes much easier. Big teams can have access to
current data via CVS even when geographically separated.
Rolling back to specific release dates is simple and reliable,
changes can even be browsed with standard free web-interfaces.
Systems like CVS and Subversion can be properly secured with
access permissions and data encryption; plus many drawing
repositories will want to be publicly readable, this is easy too.
With Subversion, each drawing would have a public permanent URL
accessible via the web.
o Multiple users can edit the same drawing at the same time.
All files are small and contain small amounts of atomic data,
this means that there are no file-locking issues whatsoever.
It would be a little crazy, but there is no reason why two people
couldn't be editing different ends of a floor plan at the same
time - The CAD program could even update the display dynamically.
o Fast saving.
Opening one of these drawings is going to be a slow operation,
but saving will be fast and efficient. You save more often than
you open, so this may lead to better performance overall.
o Merging two drawings is achieved by collapsing directory trees.
Since all objects within a drawing have unique names, two or more
drawings can simply be merged together by dropping all the files
into the same place.
Exploding a block is the same as moving all the files from a
subdirectory into the parent.
o Extensible file format.
Since any one CAD program would only ever manipulate the object
types that it knows about, other suppliers could add extensions
that wouldn't screw up.
For example, CAD program A might only know about simple objects
like line, circle, arc and text; if CAD program B starts creating
complex objects like mesh, sphere and door, then A should simply
be able to ignore them entirely without even touching the files.
--
Bruno
More information about the freearchitecture
mailing list