CUCC Expedition Handbook - Website menu design options

Logbooks - multiple formats

Why
Proposal #1

Update - January 2023

This has now been done. All logbooks use the same format now and we have only one parser.

There is an advnatgae in using a "separator format" rather than a "encapsulated entry format". When parsing the logbook.html file, everthing will be in one of the entries if we use a separator (e.g. <hr> as opposed to a <article> ... </article> encapsulation). Stuff between encapsulations is probably meant to be in an adjacent entry. So we are continuing to use the <hr> separator format style.

HTML formats - Why we needed changes

Maintenance workload

We ~~have~~ had 4 different markdown and HTML formats for logbooks of different vintages. This means 4x as much maintenance as we need.

LOGBOOK_PARSER_SETTINGS = {
                "2010": ("logbook.html", "Parseloghtmltxt"), 
                "2009": ("2009logbook.txt", "Parselogwikitxt"), 
                "2008": ("2008logbook.txt", "Parselogwikitxt"), 
                "2007": ("logbook.html", "Parseloghtmltxt"), 
                "2006": ("logbook.html", "Parseloghtmltxt"), 
#               "2006": ("logbook/logbook_06.txt", "Parselogwikitxt"), 
                "2006": ("logbook.html", "Parseloghtmltxt"), 
                "2005": ("logbook.html", "Parseloghtmltxt"), 
                "2004": ("logbook.html", "Parseloghtmltxt"), 
                "2003": ("logbook.html", "Parseloghtml03"), 
                "2002": ("logbook.html", "Parseloghtmltxt"), 
                "2001": ("log.htm", "Parseloghtml01"), 
                "2000": ("log.htm", "Parseloghtml01"), 
                "1999": ("log.htm", "Parseloghtml01"), 
                "1998": ("log.htm", "Parseloghtml01"), 
                "1997": ("log.htm", "Parseloghtml01"), 
                "1996": ("log.htm", "Parseloghtml01"),
                "1995": ("log.htm", "Parseloghtml01"), 
                "1994": ("log.htm", "Parseloghtml01"), 
                "1993": ("log.htm", "Parseloghtml01"), 
                "1992": ("log.htm", "Parseloghtml01"), 
                "1991": ("log.htm", "Parseloghtml01"), 
                "1990": ("log.htm", "Parseloghtml01"), 
                "1989": ("log.htm", "Parseloghtml01"), #crashes MySQL
                "1988": ("log.htm", "Parseloghtml01"), #crashes MySQL
                "1987": ("log.htm", "Parseloghtml01"), #crashes MySQL
                "1985": ("log.htm", "Parseloghtml01"), 
                "1984": ("log.htm", "Parseloghtml01"), 
                "1983": ("log.htm", "Parseloghtml01"), 
                "1982": ("log.htm", "Parseloghtml01"), 
            }

Complexity - missing entries

Secondly, it is highly likely that most of the different parsers have errors and so some logbook entries do not get imported. One parser, which we could devote more effort to, would mean data does not get mislaid.

Thirdly, the current format is error-prone and nonsensical, so it an unecessary learning curve for all expoers.

Logbooks: Proposal #1 - One Single Format

Architecture

Use new HTML5 tags e.g. <article> --stuff-- </article> or another tag that does not allow nesting. Ideally.
Use closing tag at end of entry - no implicit merging of entries
Explicitly handle content not in a logbook entry, e.g. title, frontispiece.

There are several HTML structural tags we could choose, see HTML5 structural elements.
DIV, SECTION, ARTICLE, ASIDE

Implementation

Start by exporting using this format from the import parsers
extensive manual checking for each logbook
Start with 2003 which has a unique parser
trial new import parser, check it gives same results as old parser on old format
repeat for each format type
retire old format parsers, archive old formats of logbook

Advantages

Reduced maintenance load in future
More expoers will write up their logbook entries ! Win!
Clearly distinct programming task: would suit newcomer

Disadvantages

non urgent work

Return to: Troggle design and future implementations
Return to: Troggle intro
Troggle index: Index of all troggle documents