[freearchitecture] An open CAD file format

Sat Feb 15 17:00:20 GMT 2003

On Fri 14-Feb-2003 at 05:35:02 +0000, Bruno Postle wrote:
> 
> Here are my requirements for a properly open and RCS-able CAD file
> format:
> 
> 1. Text only, <cr><lf> delimited, human readable.

> 2. Blocks/references have full inheritance and polymorphism.

> 3. Resources (images, multiline text blocks) exist as separately
>    editable files, not embedded in larger files.

> 4. Changes to drawing objects can be made in-place without
>    regenerating/reordering the whole dataset.

I was going to do something else this afternoon :-) Anyway, here is
Bruno's Ideal CAD File Format; I really think that something this
hare-brained is needed to bring Computer Aided Design up to speed
with developments in Free Software.

(Apologies if this is a bit technical, it assumes some experience of
CAD as well as file-formats in general)

Features
--------

o  A Drawing is a directory full of files.

   This is the one radical idea, instead of encapsulating all the
   data into a big structured file format like XML, simply use the
   file-system as a structured data store.

   The idea comes from the Maildir format for storing mail; this has
   lots of advantages over the traditional mbox single-file format
   (random-access, easy searches, simple deletes, simple appends, no
   locking).  With normal file-systems, Maildir performs just as
   well as a single-file format until there are tens of thousands of
   items in one directory - With modern file-systems like ReiserFS,
   there are no performance issues with millions of items.

o  One file per object.

   Every basic object; line, circle, spline, text, reference etc..
   is stored in a single file, one object per file.  This means that
   if your drawing consists of 1000 circles, then there will be
   1000+ files in the directory representing that drawing.

o  Packaging via zip.

   Obviously thousands of files will seem extremely strange to
   people who are accustomed to one-file-per-drawing.

   For packaging purposes, all the drawing data can be simply
   zipped-up; this is the strategy adopted by OpenOffice.org - where
   each word-processing file is actually a zip file containing
   several other files.

   (tar.gz is better for this job, but zip has wider availability)

o  Persistent names for all objects.

   Each object needs a key/name/filename, this should be unique and
   persistent through the lifetime of the object.  Something like
   this:

       <HOSTNAME>-<UNIX_DATE>-<PID>-<INODE>.<OBJECT_TYPE>

   Here's an example:

       celery-1045317918-1278-225423.circle

   Some objects will need to be renamed so they can be remembered
   later, this name will then be accessible from within the CAD
   program:

       enough-room-to-swing-a-cat.circle

o  One line per data item.

   The entire contents of a file might look something like this:

       Centre-X: 345.678
       Centre-Y: 9876.543
       Centre-Z: 0.0
       Radius: 246.8
       Transform: 1, 0, 0, 0, 1, 0, 0, 0, 1
       Units: millimetres

   Each line is <CR><LF> separated so that it can be reliably edited
   on various platforms, all data would be UTF-8.

   The diff for this after simply changing the radius of the circle
   would be human readable and look like this:

       diff .celery-1045317918-1278-225423.circle celery-1045317918-1278-225423.circle
       4c4
       < Radius: 246.8
       ---
       > Radius: 250.0

o  Object properties stored in lookup tables.

   Geometrical data, like a circle's radius, needs to be stored in
   the object itself; other types of property such as: layer, color,
   linetype, visibility etc.. need to be stored in lookup tables.

   This is one of the features of existing file formats that needs
   to be preserved.  For example, in AutoCAD, "layer 0" doesn't
   indicate that a circle is "in layer zero", it means that the
   circle "doesn't have a layer defined".

   So there may be a lookup table for "layer" with a name like this:

       celery-1045317959-1278-3652.layer

   ..it might have content like this:

       celery-1045317918-1278-225423.circle:  walls
       celery-1045317921-1278-3565.circle:  walls
       celery-1045318034-1278-36467.circle:  windows
       celery-1045318056-1278-466875.circle:  windows
       enough-room-to-swing-a-cat.circle:  construction

   Any items not listed in a layer table would simply not have an
   assigned layer.

o  Blocks, Xrefs, groups and symbols are all the same thing.

   As soon as you start treating a drawing as "a directory full of
   files", other things start becoming very obvious; for instance, a
   block/symbol within a drawing is simply a sub-directory inside
   the drawing - This sub-directory is then editable as a drawing in
   it's own right.

   Of course this needs to be referenced by the parent drawing in
   order to be used, so we have another "reference" object type
   similar to line, circle etc..

   A reference to a block might contain data like this:

       Centre-X: 567.890
       Centre-Y: 7654.321
       Centre-Z: 0.0
       Location: celery-1045318078-4358-6447.drawing/
       Transform: 1, 0, 0, 0, 1, 0, 0, 0, 1
       Units: millimetres

   An Xref, which is normally an external embedded drawing, would
   have an almost identical format, except with a qualified path:

       Location: ../windows/casement-full-height.drawing/

   Normal, non-drawing objects can be referenced:

       Location: celery-1045317918-1278-225423.circle

   Embedded data can be referenced too:

       Location: logo.png

       Location: ../specification/windows.txt

o  Polymorphism via diff/patch files.

   Any aspect of an object can be overridden, by referencing an
   associated diff file at the same time.

   So if you want a circle that is exactly the same as another
   circle in every way, except for the radius; then your reference
   file would have this line in it:

       Patch: celery-1045318234-1278-67457.patch

Advantages
----------

o  Easy access to data using standard non-CAD tools.

   Since the data is extremely parse-able, difficult things like
   database and report generation become easy.

   For example, if you need to generate a door schedule from a
   drawing set, simply use the grep tool to find all references to
   doors - Then search the results to calculate number of
   left-handed doors etc..

   Mass manipulation of data is also practicable; need to find all
   lines that are on layer "walls" and that have Z coordinates
   greater than zero? that's a simple Perl one-liner; want to delete
   them? that's easy, just `|xargs rm`; want to move them to another
   drawing entirely? that's easy too.

o  Diff files are clean and descriptive.

   Most diff files should be (almost) human readable, inspection of
   revised drawings becomes an examination of diff files rather than
   hunting around for a "revision-cloud".

   CAD software should help by allowing users to visually browse all
   the differences.

o  RCS-able data, revisions are managed by CVS or Subversion.

   By making all data Revision Control System friendly, drawing
   management becomes much easier.  Big teams can have access to
   current data via CVS even when geographically separated.

   Rolling back to specific release dates is simple and reliable,
   changes can even be browsed with standard free web-interfaces.

   Systems like CVS and Subversion can be properly secured with
   access permissions and data encryption; plus many drawing
   repositories will want to be publicly readable, this is easy too.

   With Subversion, each drawing would have a public permanent URL
   accessible via the web.

o  Multiple users can edit the same drawing at the same time.

   All files are small and contain small amounts of atomic data,
   this means that there are no file-locking issues whatsoever.

   It would be a little crazy, but there is no reason why two people
   couldn't be editing different ends of a floor plan at the same
   time - The CAD program could even update the display dynamically.

o  Fast saving.

   Opening one of these drawings is going to be a slow operation,
   but saving will be fast and efficient.  You save more often than
   you open, so this may lead to better performance overall.

o  Merging two drawings is achieved by collapsing directory trees.

   Since all objects within a drawing have unique names, two or more
   drawings can simply be merged together by dropping all the files
   into the same place.

   Exploding a block is the same as moving all the files from a
   subdirectory into the parent.

o  Extensible file format.

   Since any one CAD program would only ever manipulate the object
   types that it knows about, other suppliers could add extensions
   that wouldn't screw up.

   For example, CAD program A might only know about simple objects
   like line, circle, arc and text; if CAD program B starts creating
   complex objects like mesh, sphere and door, then A should simply
   be able to ignore them entirely without even touching the files.

-- 
Bruno