File Formats

From Fs_wiki

Jump to: navigation, search

This page is an overview of file formats used by Tunnel as well as the context in which the design of Tunnel has been made. I am not an expert on all the offerings that are available from the many other cave surveying programs. What I hope for is that this program contains some features which none of the others have.


Raw Survey Data

There is no limit to the amount of data that can be included in a cave survey, just as there is no limit to the size of a cave. It is therefore a good idea to break what could be a monolithic stack of data down into smaller, more manageable chunks.

Probably the most stable element of a cave survey is a single survey trip. This is the total set of observations made by one surveying team during one visit to a cave. Compared to alternative ways of dividing a cave survey into smaller pieces (by passage series, region, etc), this is guaranteed to produce pieces of roughly similar size without the need for prescience (the foreknowledge of how the cave will look when fully explored), and with an allocation that is as immutable as historical fact.

It is not possible for one measurement or observation to belong to two separate trips, so it is obvious where each piece of data belongs. However, it is possible for two observations in separate trips to be made of the same object. When this happens, the observations must be associated. There is always some overlapping or associations between separate survey trips, for this is how the separate surveys are bound together into the whole: they share adjoining survey stations.

Observations (and information) made during a single survey trip is generated in many forms: survey notes, measurements, logbook trip reports, photographs, sketches. Since computer memory is now cheap and paper is all too frequently lost, all pictorial information and handwritten notes should be scanned into the computer in one form or another. It is a good idea to place everything from a single trip into a single directory.

Because computers are hopeless at text recognition (particularly if it's smudged handwriting) the survey notes and measurements need to be transcribed into text. A good file format to use is a ".svx" file, which is pure ascii text and can be read by Survex as well as Tunnel. The layout of the text in the file should be the same as on the survey note paper so that it is easy to compare and check for mistakes. Survex can read a wide variety of data formats as long as they are properly specified at the beginning of the file with the "*units", "*data" and "*calibrate" commands. Tunnel implements most (but not all) of these difficult features.

An example of scanned cave surveyed notes.
An example of scanned cave surveyed notes.
;from to tape compass clino ; remarks
1 2 8.23 205.5 -19  ;
3 2 1.80 010 +15
3 4 6.42 290 -12
5 4 3.75 116 +10
5 6 2.81 251 -16
7 6 3.06 136 +18
7 8 3.40 281 -12
9 8 4.33 136 +11

;station details
;name L R U D E description
;1 .85 0 1.00 1.10  rs
;2 1.20 0 0.9 0.7  rs
;3 N/A 0 0.4 0.8  c below ridge
;4 0 0.6 0.7 0.6  pencil below c
;5 0 0.7 0.9 0.7  c on point on inside of bend
;6 0 0.6 0.65 0.9  p on left
;7 0.7 0 0.6 0.8  c on spike on RHS
;8 0 0.6 0.4 0.9  p on wall
;9 1.1 0 0.8 1.0  c on right wall

An example of the same notes transcribed into machine readable form.

Once the survey data is in a form that can be read by a computer, it can be turned into a line image by doing a series of routine geometrical calculations. These calculations are little more than three-dimensional trigonometry in most cases involving converting tape, compass and clino measurements into an XYZ vector. Once the relative vectors between each connected pair of fixed points, or survey stations, is known, the position of all the points can be found relative to the entrance by summing the vectors of the paths from the entrance to each point. If there is only one route through the survey to any point, the answer is clear. However, if there are loops in the survey, as there frequently are in caves, there will be error discrepancies depending on which of the many alternative paths the computer traces from the entrance to each point.

A good way to deal with the problem of these loops is to somehow average between the alternative positions and spread the error evenly around the stations in the loop. The authors of Survex have devised an algorithm to do this, but it has not been implemented in Tunnel. However, Tunnel can read the output of Survex and set the positions of the stations in the overall survey sketch.

Raw Sketch Data

An example of a scanned drawing of a survey drawn over a plot of the surveyed centreline.
An example of a scanned drawing of a survey drawn over a plot of the surveyed centreline.

Tunnel is a cave drawing system (among its other attempts at being a 3D cave modeller). In the same way as you can transcribe the text from the survey notes into a format that the computer can use, you can transcribe or trace out the sketches of the passage outlines using the sketch editor.

(I have experimented with alternative ways of transcribing the sketch, from using a graphics pad to sketching with a mouse behind an acetate sheet stuck to the screen. The method of drawing the lines to follow a background image has so far proven to be the most effective.)

There are six line types for sketching the cave survey: wall, estimated wall, pitch boundary, ceiling boundary, detail and invisible. Each of these can be splined to make them appear smooth for printing. When the sketching is complete, the computer can search the sketch and identify the closed regions. Symbols, such as gradient arrows and boulder fields will be restricted to these regions when they are rendered so that they don't cross the lines and get drawn outside the cave.

The image has been loaded into the background workspace of Tunnel and then traced in the editor. The red lines are the centreline and the blue lines are the walls of the passage.
The image has been loaded into the background workspace of Tunnel and then traced in the editor. The red lines are the centreline and the blue lines are the walls of the passage.

Sketches are drawn for the surveys from each trip over the centreline data. When the full cave is assembled, each piece of centreline data finds itself in some place in the full drawing. Usually this centreline will not match exactly because it will have been slightly distorted to account for the loop errors in the big picture. Fortunately, Tunnel is able to copy these distortions of the centreline and morph the sketched curves of the passage walls when it copies them into the big picture. The only task that remains is connecting up the open-ended walls between the surveys and resolving the discrepancies.

Each sketch is saved into an XML file (eXtended Markup Language) which is like a generalized version of html in that all the commands are user-defined. Once you get the idea of the syntax, it's very simple. Commands are opened with a '< command >', and closed with a '< /command >', and any command can have a list of attributes of the form 'attribute="value"'. Some commands close immediately after they have opened. These can be abbreviated to '< command/ >'. I have just explained everything I know about XML. I have found no use for document formats, DTD, Schemas and the like since all they do is automatically tell you whether an XML file is valid or not. But I can easily check for validity by loading it into Tunnel and listening for an error.

See below for a part example of an XML file for the sketch show in the pictures.

< tunnelxml>
< sketch splined="0">
 < affinetrans aftrm00="0.09926831411019867" aftrm01="0.0" aftrm10="0.0" ... >
  < backimage imgfile="../2000 stuff multisurveys/110_bidetplan.png"/>
 < /affinetrans>
 < skpath from="2" to="4" linestyle="centreline">
  < label>< tail> 7< /tail>< head> 5< /head>< /label>
  < pt X="0.7547953" Y="-8.267923" Z="0.0"/>
  < pt X="-2.530469" Y="9.2143755" Z="0.0"/>
 < /skpath>
 < skpath from="2" to="1" linestyle="centreline">
  < label>< tail> 7< /tail>< head> 8< /head>< /label>
  < pt X="0.7547953" Y="-8.267923" Z="0.0"/>
  < pt X="2.6320891" Y="-13.606478" Z="0.0"/>
 < /skpath>
 < skpath from="1" to="3" linestyle="centreline">
  < label>< tail> 8< /tail>< head> 9< /head>< /label>
  < pt X="2.6320891" Y="-13.606478" Z="0.0"/>
  < pt X="-10.039644" Y="-10.438544" Z="0.0"/>
 < /skpath>
 < skpath from="1" to="0" linestyle="centreline">
  < label>< tail> 8< /tail>< head> 10< /head>< /label>
  < pt X="2.6320891" Y="-13.606478" Z="0.0"/>
  < pt X="3.160078" Y="-25.046238" Z="0.0"/>
 < /skpath>
 < skpath from="5" to="17" linestyle="wall">
  < pt X="-11.6236105" Y="-9.382566" Z="0.0"/>
  < pt X="-9.628986" Y="-9.793225"/>
  < pt X="-8.103685" Y="-10.145218"/>
  < pt X="-6.402387" Y="-10.4972105"/>
  < pt X="-3.8211083" Y="-11.318526"/>
  < pt X="-1.8264837" Y="-11.787849"/>
  < pt X="-1.2984948" Y="-11.963846"/>
  < pt X="-1.1811639" Y="-11.1425295"/>
  < pt X="-1.5331565" Y="-9.734559"/>
  < pt X="-1.9438145" Y="-7.9745965"/>
  < pt X="-2.2958071" Y="-4.6893325"/>
  < pt X="-2.7651305" Y="-2.1080532"/>
  < pt X="-2.7382674" Y="-1.4096141" Z="0.0"/>
 < /skpath>

An example of part of the xml data representing the sketch in the images above. It is all in human readable form, even if it does not entirely make sense. The "backimage" command on the fourth line specifies the filename of the scanned image used for the background. The objects called "skpath" connect from node to node along an XY path and each have a "linestyle".

Processed survey data

In general, a directory for a survey trip that has been processed by Tunnel contains two futher files that bear explaining. The full list of files in a sample directory runs like: "survey_trip_notes.png" for the scanned notes and drawings, "survey_trip.svx" for the transcribed data, "survey_trip-sketch0.xml" for the computerized sketches of the drawings, "survey_trip.xml" for the interpreted survey data, and "survey_trip-exports.xml" for the connections to other parts of the cave.

< tunnelxml>
< measurements name="110_bidet_b">
< set date="2000.07.27">
< set title="110_bidet_b">
< set tapeperson="Martin">
< leg from="7" to="5">
 < tape fval="7.6"/>
 < compass fval="192.0"/>
 < clino fval="-6.0"/>
< /leg>
< leg from="7" to="8">
 < tape fval="2.5"/>
 < compass fval="20.0"/>
 < clino fval="24.0"/>
< /leg>
< leg from="8" to="9">
 < tape fval="8.14"/>
 < compass fval="262.0"/>
 < clino fval="-47.0"/>
< /leg>
< leg from="8" to="10">
 < tape fval="5.17"/>
 < compass fval="2.5"/>
 < clino fval="21.0"/>
< /leg>
< /set>
< /set>
< /set>
< /measurements>
< /tunnelxml>

An example a survey XML file for a short trip composed of four legs.

These last two files are written but not yet read by Tunnel. (Tunnel can already read the raw Survex format which contains the same information, so I have not needed to implement it.) However, these two files represent the native form of data used inside Tunnel, so you can look at it to see if any mistakes have been made. They make a good demonstration of the way Tunnel uses the XML format to store information.

When the CaveXML project finally produces its own format it will be easy to write a converter from the Tunnel format to it because the data should be exactly the same, and the Tunnel format is rather straightforward.

What makes the Tunnel format simple is the idea that there is only one element of real information: a discrete measurement, be it a tape, compass, clino, depth gauge, altimeter or GPS reading. All other things are attributes of one of these types of measurements. In particular, survey legs, or "shots" do not exist insofar as they are nothing more than groupings of, usually, three discrete measurements.

A great deal of complexity is saved by taking the view that survey legs don't really exist because there are so many variations of backsights, foresights, instrument types and combinations thereof. The variations involved in specifying a single measurement, on the other hand, are limited. It's usually one instrument, one person (two if it is a tape, although only one person reads it), two locations (or one if it is point measurement like a depth gauge or temperature reading), and one instantaneous moment in time.

Why, then, does the example above appear to contain "leg"s? Well, there are two types of command in the Tunnel XML format. There are the actual measurement records, "tape", "compass" and "clino" in this example, and then there are the attribute holders, "set" and "leg". Attribute holders are all the same, but some of them have use the word "leg" instead of "set" to make it easier for people to decode the data. The purpose of the attribute holder is to reduce the repetitive parts of the data. For example, the first three measurements in the above file could equivalently have read:

< tape fval="7.6" date="2000.07.27" title="110_bidet_b" tapeperson="Martin" from="7" to="5"/>
< compass fval="192.0" date="2000.07.27" title="110_bidet_b" from="7" to="5"/>
< clino fval="-6.0" date="2000.07.27" title="110_bidet_b" from="7" to="5"/>

But, for example, the "date" attribute is the same across all of them, so it might as well be bracketted by the command "< set date="2000.07.27" > ... < /set >", as it has been in the example. If the same attribute is set twice, the inner one takes precedence, so if you were keen on showing a backsight as a part of a leg, or changing the tapeperson at one instance, this information could be filled in.

< leg from="7" to="5">
 < tape fval="7.6" tapeperson="Julian"/> < --I held the tape-- >
 < compass fval="192.0"/>
 < clino from="5" to="7" fval="-6.0"/> < --a backsight-- >
< /leg>

Here, "fval" stands for "floating point value". Maybe I could have used the word "reading".

This XML format, which is used for recording the sketch data as well as survey data, has proven to be very easy to design and code. Many of those hard, arbitrary choices between what is a record, command or attribute have been made for you once you decide that measurements (or observations) are the only records that exist, and everything else is an attribute of them. (It means that if you go on a survey trip, but you do not participate in any measurements, your name does not have to appear!)

It would be nice if it were adopted by the community, but it doesn't bother me because I am ultimately more interested in programming than in file format design.

Joining the surveys together

The method of joining, or associating same stations, between survey trips, as well as the allocation of data into files and directories has been a subject of contention between Tunnel and Survex. A previous version of Tunnel tried to abolish the "*equate" command which makes any two (or more) named stations lower down in the "survey tree" become the same object. I tried to replace it with a more structured "*export" command that could only associate a station in the current survey trip to a symbol one level higher. This is not to be confused with the "*export" command as it was eventually implemented in Survex, which acts more like an "extern" command from the 'C' language in that it does nothing but state which stations can be referenced by an "*equate" command from above. Thus it merely served to add yet another command to an already complex system without taking anything away.

In the current version of Tunnel I have decided to go further and make the case for abolishing the "*begin", "*end" and "*include" commands which between them cause nothing but trouble. When these commands are used in a user friendly way (ie, such that the person at the computer can see at a glance where all his files are), file names match survey names and there is exactly one survey per file. In these cases these commands confer no information whatsoever.

The term "survey tree" refers to the grouping of survey trips made after they have been completed. The groupings are not necessarily fixed and can change according to how the cave exploration develops. Often, the survey trips can be grouped by cave entrance, or branch. For example, "Entrance-A" leads to "Branch-1" and "Branch-2", and the first contains survey trips "Trip-Z", "Trip-Y" while the second contains "Trip-T". This forms a tree with three levels, the first with one node, the second with two nodes and the third with three nodes. In Survex this could be expressed in the following three files:

 *begin Entrance-A
   *begin Branch-1
     *include "Branch1/Trip-Z.svx"
     *begin Trip-Y
       *include "cavebit.svx"
     *end Trip-Y
   *end Branch-1
   *begin Branch-2
     *begin Trip-T
       ; some survey data for Trip-T
     *end Trip-T
   *end Branch-2
 *end Entrance-A

 *begin Trip-Z
   ; some survey data
 *end Trip-Z

 ;some survey data

The "*include" command inserts the text of the file it references into its location within the file itt sits. It's a feature borrowed from the 'C' language, but it leaves open, as shown in this example, the question of whether the "*begin" and "*end" commands which build the "survey tree" belong outside or inside the "included" file, and leaves you free to pick whatever unhelpful name or directory location you desire for that file. In other words, you get two trees modelled by your data: the first is the "survey tree", defined by the "*begin"s and "*end"s and the second is the tree of files defined by the "*include" commands. These trees are often very similar and can cause no end of hassle when they differ because there is rarely an adequte reason for them to.

In Tunnel I am suggesting that you need neither. Since each survey trip is contained in exactly one directory (along with all the associated scans of survey notes and drawings), you can use the directory structure to define the "survey tree". When Tunnel loads from its native format, it starts by pointing at a directory rather than at a specific file. It loads everything by recursing through the files in that directory as well as down into the other directories it contains.

However, to make it less awkward, Tunnel doesn't actually force you to (un)mangle or touch your sacred Survex data in whatever state it lies in. Tunnel can read in almost every example of Survex data because it interprets the "*include" and other commands properly. However, when it saves the data (in particular, any sketches you may have made) it needs to create a directory tree to match the "survey tree". You can reload from this in native mode and work on sketches. You can also load from your original Survex files when there is a change and resave it over this directory structure (it won't over-write the sketches). Alternatively, if you trust it, you could decide to use Survex on the Tunnel generated ".svx" files which should be valid because they do contain consistent "*begin", "*end", "*include" and "*equate" commands even though they are not required by Tunnel when loading in native mode.

Export Files

The main topic of disagreement with Survex was with the "*equate" command. These reference down into the data of a survey trip from above and associate two or more survey stations to be the same. I have long advocated getting rid of this command because it is like the "goto" in a high level programming language-- too powerful in scope, regardless of its convenience. My alternative proposal can be found in a "-exports.xml" file with lines like:

< export estation="5" ustation="aday2bidet.5"/  >

What this says is that the survey station "5" should be associated to a station called "aday2biddet.5" in the directory above. If the directory above contains another survey trip directory which exports one of its stations upwards to the same name, "aday2biddet.5", then the two stations are equal or "equated". Granted it takes two separate commands to do this rather than a single equate, but this has arguably has more structure. If, in the above example, we discovered a link between a station in "Trip-Z" and "Trip-T" we would indeed have to use four separate "export"s to kick the references to the stations up two levels each to where "Branch-1" and "Branch-2" were common instead of using one single equate that reached down through the levels below on either side of the divide. This is the main criticism of this proposal. Perhaps, if it is the case that "Trip-Z" and "Trip-T" are being frequently connected, it is time to reconsider the "Branch-1" and "Branch-2" bifurcation and break up the cave structure someplace else. When you perform this feat of rearranging, "export"s will prove to be a lot easier to handle than "equate"s. For a start, an "export" makes no reference to the name of the directory. And secondly, I will probably write some tools to make it easy to move the directories around while keeping all the associations the same.

3D and POS files

See Tunnel Survex reader

Further topics

Tunnel was originally written to model and render caves in terms of their volume. The volume was supposed to have been derived from cross sections measured at frequent intervals along passages. Unfortunately, surveyors couldn't be persuaded to gather enough cross sections of good quality to form such models and the project never bore fruit. However, the programming is still there to be used by anyone who wishes to gather the data (probably photographically) and experiment with it.

Please see the tutorial and try downloading and running the program for a more complete overview of the features (half-baked and fully functioning) that are in this cave drafting system. I hope you find it of interest.


  • Cave Data Exchange Working Group A very comprehensive web page of other people's efforts as well as links to good XML primers.
  • Survex The cave surveying software which Tunnel is modelled on.
Retrieved from "-- /wiki/pages/File_Formats"
Personal tools