The Casebooks Project Transcription Guidelines

by John Young, Michael Hawkins and Robert Ralley
Department of the History and Philosophy of Science, University of Cambridge, Cambridge, UK


These Guidelines should be read in conjunction with the accompanying Element Set. The Guidelines provide a transcription policy and a broad outline of which tags to use where, while the Element Set gives specific details of what exactly is required or permitted in each element and where it may (or may not) be used. The guidelines offer an introduction to the transcription process; the Element Set is intended primarily for reference. Between them, these documents are (or at least aim to be) a plain English version of the Casebooks Project schema.

So far as possible, jargon is kept to a minimum, but familiarity has been assumed with certain key XML terms and the means of representing them: principally element (and the distinction between empty and non-empty elements), entity, attribute and attribute value. It is essential that transcribers are entirely clear about these terms, and understand the principle of nesting elements. These terms are not difficult to grasp and are explained in any guide to XML. Newcomers to the language may find the 'Gentle Introduction to XML' on the TEI (Text Encoding Initiative) website useful.

A particularly useful tag is the 'comment tag', which transcribers and editors are strongly encouraged to make liberal use of. This takes the form '<-- -->', and may contain, between the two sets of double dashes, any comment the transcriber wishes to make for in-house purposes on the source or the transcription, e.g. '<-- the judgment is above the chart - rr -->' or '<-- what on earth is he on about? - jy -->'. Its contents will not be publicly visible, though it is advisable to avoid anything libellous or obscene. It can be very helpful for proofreaders to be told, for instance, where exactly a particularly well-hidden urine note appears, or why a potentially contentious coding decision has been made. Comment tags can also be used to express any doubts or queries about a particular interpretation, to record the content of event information, to explain the rationale for a conjecture, or in short to convey any information not covered by the tagging as such that may be of use to the editors when reviewing the data. It is helpful for such comments to be signed, as in the above examples, with the relevant commenter's initials.

A number of particular, capitalised, code headings for comment tags are set out on the 'Types of comment tag' page of the project's wiki. The most important ones are <-- CHECK ORIGINAL --> and <-- TODO CB ORIG -->, which mean, respectively, that the original document needs to be checked because the available image is deficient in some respect, and that a particular item of metadata will or may have to be supplied or revised once such checking has taken place. These are absolutely indispensable to the person who does the final-checking, as they enable her or him quickly to identify (by means of a global search) which manuscript pages need to be revisited and precisely which pieces of metadata require attention.

Throughout this document, the term 'text string' is used to mean 'any quantity of continuous text': this may be a single letter within a word, a whole word, a sentence, five-and-a-half words, ten paragraphs or whatever.

Please pay especial attention to the instructions concerning spacing around elements. Some of these may look like rules for rules' sake but they are there for a purpose and have important ramifications for the display of the transcribed texts. When reading densely marked-up text in XML, it is alarmingly easy to overlook the presence of a space that should not be there or the absence of one that should be.

In most of the real-life examples cited in these guidelines, some of the tagging used in the original files has been excised, to avoid distraction from the point at issue. For instance, brevigraphs are quoted in expanded form except where the mechanism for expansion is what is being illustrated; errors in the original are silently corrected except where the <sic>/<corr> mechanism is what is being illustrated, and so forth.

Almost all the coding now used in Casebooks Project markup is derived from the TEI Guidelines (version P5), but a number of project-specific elements have been introduced, all beginning 'cb:', e.g <cb:consultation>, <cb:querent>, <cb:subject>, to cater for certain specialised terms crucial to Casebooks methodology. These terms are explained in the Header section of the Guidelines.

The Header (<teiHeader>)

Metadata (information about the electronic file and its contents) is recorded in a <teiHeader> element. This has three components, <fileDesc>, <profileDesc> and <revisionDesc> (always in that order).

I. <fileDesc>

describes the electronic file and its source and consists of the following elements:

1) <titleStmt>, which contains only the element <title>. Once files have been made available online, this is the word CASE followed immediately by the number assigned in the pilot project or subsequently. However, this has no real significance beyond being a unique number by which to identify the case. If another case turns up or we decide that what had been recorded as one case is really two, the new file becomes CASE[whatever the next number in the sequence is at the time]. If we decide that what had been recorded as two cases is really only one, one of the titles simply disappears from the sequence.

Provisional titles are assigned to transcriptions in progress, based on the call number, folio, and sequence on the page (e.g. MS_Ashmole_237-f0153r-3 is the third case (reading first down the left column and then down the right, irrespective of whether or not this equates to the chronological order of the entries) on f. 153r of MS Ashmole 237). These provisional titles are then auto-converted to arbitrary but unique CASE numbers once a given volume has been completed.

2) <publicationStmt>, which contains <publisher> (The Casebooks Project), <pubPlace> (Cambridge) and <date>, with the date expressed in ISO style (yyyy-mm-dd) as the @when value of <date> and spelled out using the formula '01 April 2010' as the content of <date>. The content and @when values of <date> will be auto-generated as files are released online.

3) <notesStmt> (optional), which contains one or more <note>s, which in turn may contain either text or <p> (though it seems unlikely that a note here will run to more than one paragraph: if it does it should probably be two or more <note>s). These notes each take one of the @type values defined in the entry for <note> in the Element Set (and, optionally, a @resp value of # and the relevant editor's @xml:id value if he or she wishes to claim/admit authorship) and can be used to supply more detailed information about the source document or particular problems associated with its transcription, e.g. 'This aborted entry has been given an approximate time and a date based on its position in the manuscript', 'This entry is a note inserted between the question and chart of CASE3884, and the date and time are taken from that entry', 'The manuscript is water-damaged here'.

Where such references to other entries are included, a link should be provided by means of a <ref> element as explained in the entry for <ref> in the Element Set.

There can be no hard and fast rules about what can or cannot be expressed in <notesStmt>, whether in terms of describing the source document itself, its content, or the interpretation of that content. So far as possible, information about both the document itself and its contents should be recorded in the tagging rather than the <notesStmt> but this provides a useful fallback option for at least provisionally recording any supplementary data.

4) <sourceDesc>, consisting of the following:

a) <bibl type="positionOnPage">: the quadrant(s) as defined in the pilot project (1 = upper left, 2 = lower left, 3 = upper right, 4 = lower right). This normally refers to the quadrant in which the record begins, irrespective of whether or not it then proceeds into a lower quadrant (i.e. from Q1 to Q2 or from Q3 to Q4). However, a certain amount of leeway is permissible: if a case largely occupies Q2 or Q4 but strictly speaking begins in Q1 or Q3, it can be treated as being in Q2 or Q4. The purpose of this is to help users (and ourselves) figure out where on the image to find the entry, so if there are only four cases on a page it makes sense to call them 1, 2, 3 and 4 rather than 1A, 1B, 3A and 3B in a case such as MS Ashmole 234, f. 43v, where the lower 'half' of the page is actually quite a bit larger than the upper 'half'.

If more than one case begins in the same quadrant, they can be distinguished as (for instance) 2A, 2B, 2C etc - following the order the entries were written in (or what seems the most plausible order if there is no concrete evidence).

If, however, an entry crosses a column break or jumps from one part of the page to another, this needs to be spelled out. Both practitioners quite often begin a case in Q2 and finish it in Q4 (i.e. they have divided the page horizontally as well as vertically). In such cases, it is sufficient to record <bibl type="positionOnPage"/> as '2, 4'. Napier in particular is prone to scatter the component parts of a given entry around a page in a counter-intuitive order, but provided there is only one page involved, the same procedure can be used, e.g. '<bibl type="positionOnPage">3A, 4, 2</bibl>'.

If the entry occupies more than one page (e.g. CASE1387, which begins in quadrant 4 of MS Ashmole 226 f. 42r and for some reason continues in quadrant 4 of MS Ashmole 226 f. 46r), this needs to be spelled out as '<bibl type="positionOnPage">f. 42r/Q4, f. 46r/Q4</bibl>'.

An <altIdentifier> is also provided (automatically, at the time of the file's creation) to indicate that, for instance, the shelfmark 'MS Ashmole 182' equates to 'Napier Volume 2' in Casebooks terminology.

b) <msDesc>, which contains:

i) <msIdentifier>, which in turn contains <country>, <settlement>, <repository> and <idno>. The attribute values (if needed) and content of these elements are auto-generated from existing project documentation, but if component parts of the entry appear on more than one page the content of <idno> requires handcrafting.

The content of <idno> consists of the manuscript number in question and the number(s) of the page(s) on which the entry appears, e.g. 'MS Ashmole 207, f. 104v'. If the manuscript has been paginated rather than foliated, the original page numbers should be cited here, preceded by 'p.' rather than 'f.'.

If the entry runs to more than one page and is in a logical, uninterrupted sequence, the relevant pages should be cited as a range, on one of the following models:

f. 246r-v (singular 'f' since only one folio is involved, albeit both sides of it) ff. 246v-247r pp. 246-247

If the entry runs to more than one page and is not in a logical sequence, e.g. it proceeds backwards from f. 246v to f. 246r, or it leapfrogs a page somewhere in the middle, the page or folio numbers given in <idno> should be presented in the order they were intended to be read in (or a best guess at the order they were intended to be read in) and separated by commas, e.g. 'f. 106v, f. 109r'.

ii) <msContents>. This contains <msItem>, which in turn contains <locus> and <title>. These elements, too, are auto-generated from existing project documentation. However, the @from and @to values of <locus> require handcrafting if the component parts of an entry appear on more than one page.

The @from and @to values of <locus> are normally both the same and are the page or folio number in question rendered as a 4-digit string with leading zeros if necessary, and with a suffix of 'r' or 'v' if the document is foliated rather than paginated, e.g. '<locus from="0104v" to="0104v"/>' for an entry in a foliated document, or '<locus from="0104" to="0104"/>' for an entry in a paginated document.

If the entry runs to more than one page and is in a logical, uninterrupted sequence, the @from value is that of the page it begins on and the @to value that of the page it ends on. If it runs to more than one page and is not in a logical sequence, it needs two or more <locus> elements, each defining one of the pages, listed in (what is presumed to be) the intended reading order (see under <locus> in the Element Set for examples).

There are some cases in which the foliation or pagination is itself illogical, e.g. there is no f. 50, there are two f. 50s, or there is an unnumbered folio between f. 50 and f. 51. In the first of these cases, the omitted folio number can simply be omitted from the transcribed texts (i.e. there will be no entries with 'f. 50r' or 'f. 50v' as the content of <idno>, or f0050r or f0050v as the @from or @to value of <locus>). In the latter two, the second page or folio takes the number of the first one with a capital A appended to it (e.g. 'f. 50Ar'). If there are more than two pages or folios with the same number, or more than one unnumbered page or folio between consecutively numbered pages or folios, the second misnumbered or unnumbered page or folio takes the suffix B, and so forth.

II. <profileDesc>

describes the contents of the file and of its source. This is the most important part of the file, and the source for data harvesting. It consists of the following:

1) <handNotes> (mandatory). This in turn contains one or more <handNote> elements, each with a @sameAs value of #sforman or #rnapier, or # followed by the @xml:id value of the scribe as declared in <listPerson>. These point to the <person>(s) in whose hand(s) the transcribed text is written. Hands that only feature in the untranscribed portions should not be recorded in <handNotes>, though it is helpful to note their presence in comment tags. The content of <handNote> is simply the scribe's identity in natural language ('Simon Forman', 'Richard Napier', 'Gerence James', 'Unidentified Hand', or whatever appellation has been decided on for distinguishable but as yet unidentified scribes). See Hands in the casebooks for a list of <roleNames> and @xml:id values defined to date.

2) <langUsage> (mandatory except in the few cases where the entry has no textual component at all, e.g. it consists solely of a chart). This contains one or more <language> elements, each with the @ident value en, la, el, he, fr or und (English, Latin, Greek, Hebrew, French, Undetermined). 'Undetermined' is for entries that did contain text originally but have been damaged to such an extent that it is impossible to determine what language it was in.

Only the languages that feature in the transcription should be listed, not those that appear only in other, untranscribed, parts of the document. Where there is more than one language they should be listed in order of priority: e.g. if the document is primarily in Latin with a few words of English, <language ident="la"> should be the first element in <langUsage>; if the contrary, <language ident="en"> should be.

The various abbreviations of 'ante meridiem', 'in meridie' and 'post meridiem' are being normalised anyway and do not count as 'Latin' for <langUsage> purposes if they occur in otherwise English text. Forman's idiosyncratic term 'questo' appears to mean either 'question' or 'quaestio' according to context, and does not need to be distinguished as 'foreign' in either case. The same goes for 'Anno', 'stat' (meaning either 'state' or [in Latin] 'status'), and for his even more idiosyncratic 'halk/halek' (meaning 'has/have/had sex with'), which may occur in either language.

For guidance on how to deal with language changes within the body text, see the Content Tagging section of the Guidelines.

3) <particDesc> (participant description: mandatory). 'Participant' is here defined very broadly, meaning the practitioner(s), the scribe(s) (if relevant), and anyone mentioned in the transcribed portion of the entry, even if they do not, strictly speaking, participate in the consultation. It may also, as explained below, include people mentioned only in the untranscribed portion of the entry if they seem relevant.

If, as quite often happens in Napier's records, it appears that a locum or assistant was standing in for the astrologer and delivering judgments, treatments etc. without the master's supervision, the locum or assistant should be treated as practitioner and Forman or Napier should be omitted entirely from <particDesc> The @name value of <cb:practice>, however, remains forman or napier. In some cases, the question of whether the person who wrote the entry really was acting on his own initiative or merely taking dictation is a matter of editorial judgment, but there are a number of entries in which it is objectively demonstrable that Napier had left his practice in the hands of his curate Gerence James, or of one of various other as yet unidentified assistants.

<particDesc> contains the mandatory <listPerson>, which consists of the following:

a) a sequence of <person> (and, if appropriate, <org> and/or <personGrp>) elements, each of which has a mandatory @xml:id. The @xml:id value for Forman is always sforman, for Napier rnapier, for Gerence James gjames and for Sir Richard Napier srnapier. Values for other identifiable (or at least recognisable) scribes or practitioners will be added as need arises. Completely unidentified scribes take the @xml:id value unknown (or unknown1, unknown2 etc. should there ever turn out to be more than one unidentified hand in a given entry). In all other cases apart from the editors, the @xml:id value for an individual person is generated, if possible, from the person's initials (in lower case), e.g. rc for Richard Carter. If only one initial is known, a one-letter value is permissible. If this leads to duplication, numbers can be appended to the letters to differentiate them, e.g. ar1, ar2.

People who are referred to in the record solely by a <roleName> such as 'Bishop of London' or 'Lady Sanquhar' (where 'Sanquhar' is a title, not a surname, as in the case of Anne Crichton, Lady Sanquhar) should be given an @xml:id value of "person1" (or "person2", "person3" etc. should more than one such person appear in a given entry).

If there is no indication whatsoever of any other participant's name and no means of inferring it from the judgment or other contextual evidence, he or she should be entered as <person xml:id="anon"> (e.g. CASE1173 where the question reads only 'At 3 post m the first feb &Tuesday; 1597', and no judgment is given). If there are two or more completely anonymous but obviously distinct persons involved, they should be designated anon1, anon2 etc.

If people who have some fairly obvious relevance to the case are specified only in the untranscribed part of the document, they should be listed in the header with the relevant <person> element taking an @evidence value of internal. People mentioned only in passing in the untranscribed portion can be ignored in the header. The question of whether or not people mentioned in the untranscribed portion are in fact relevant is a matter of editorial judgment, but suspected thieves or witches, for instance, should certainly be included and listed in <cb:consultation> as <cb:object>s (see below). Other people referred to by name or occupation in the untranscribed section should also be included (though not necessarily ascribed a role), on the grounds that this is surely concrete information provided by the querent rather than something the practitioner believed he had read in the stars. Thus, 'shee loved on that did deceive her' does not warrant the inclusion of an anonymous male participant, but 'shee loved John Smith that did deceive her' or 'shee loved a baker of Stony Stratford that did deceive her' does warrant the inclusion of John Smith or an anonymous male baker in <particDesc> (it seems reasonable to assume, when dealing with texts from this period, that same-sex romantic or sexual partners would not have been referred to as such during a consultation). There are even a few cases in which a querent or subject is mentioned only in the untranscribed section, in which case <cb:querent> and/or <cb:subject> should also take an @evidence value of internal.

There will always be at least one <person>, viz. the practitioner (who may also be the querent and may also be the subject).

<person> contains as many of the following as are applicable:

i) <persName>, which in turn contains as many as applicable of <roleName>, <forename>, <surname>, <genName>, <nameLink>, <addName>, <name>.

<roleName> covers appellations such as 'Mr', 'Mrs', 'Sir', 'Goodwife', or other formulae besides personal names, family relations (e.g. 'Smiths cosen') or job titles used to specify a person's identity. E.g.

<persName><roleName>Mr</roleName> <forename>John</forename> <surname>Perkins</surname></persName> <persName><roleName>Lady</roleName> <surname>Throckmorton</surname></persName>

Titular role names such as 'Lord', 'Mrs', 'Alderman', 'Dr', 'Goodwife/Goody' should be placed first within <persName>, while more descriptive ones (e.g. 'Bishop of London') should be placed last. In most cases this reflects the way in which they are presented in the original.

There are cases in which it is debatable whether a given formulation counts as a <roleName> or not. Less clear-cut examples will have to be resolved by editorial consultation. Aristocratic role names can be particularly problematic in this respect: for instance, the regular client Napier refers to as 'Lady Zancher' turns out to be the historically attested Anne Crichton, nee Fermor, Lady Sanquhar, and should be given the <roleName> 'Lady Sanquhar' if none of her alternative appellations is mentioned in the transcribed part of the entry.

Role names given in Latin in the original should be translated into English in the header. This is a fairly rare occurrence usually involving 'Domina' or 'Dominus', meaning (confusingly) either 'Mrs/Mr' or 'Lady/Lord', but it is normally possible to determine either from external evidence or from elsewhere in the casebooks which translation is to be preferred.

Where variant forms of some or all of a person's name are specified, more than one <forename> and/or <surname> can be used, with whatever is considered to be the less canonical version taking the @type value alternate. E.g.

<persName><forename>Edward</forename> <surname>Mabsden</surname> <surname type="alternate">Butcher</surname></persName>

for 'Edward Mabsden alias butcher' (CASE1784).

If the entire name consists of two alternative but not mutually exclusive formulations, e.g. 'mres Elis: Tyrrle the young Lady Tyrle' (CASE30263), two <persName> elements should be given, without attributes:

<persName><roleName>Mrs</roleName> <forename>Elizabeth</forename> <surname>Tyrrell</surname></persName> <persName><genName>Young</genName> <roleName>Lady</roleName> <surname>Tyrrell</surname></persName>

In most cases, where alternative surnames are recorded in the source text, they are the maiden, married, or previous married names of female participants. Normally, one of the alternatives is referred to in the record as 'alias' or 'otherwise', with no indication of the status of either name. In the comparatively rare cases where the status of such names can be at least conjecturally determined, the variant surnames can each take the @type values maiden or married.

<surname> may also take the @type value adopted in the extremely rare event of someone being explicitly described as having adopted a different surname (as opposed to acquiring one by marriage or being referred to under a false name).

The basis on which a name has been determined where none is explicitly stated in the document may be clarified by applying @evidence values of internal, external or inferred. internal means that it has been taken from the untranscribed part of the entry in question. external means that it has been taken from a source external to the entry in question (including elsewhere in the casebooks). inferred applies to cases such as assuming that an unnamed wife has the same surname as her husband, an unnamed father has the same surname as his 5-year-old daughter, etc.

@cert values of high, medium or low may also be applied to any or all of <persName>, <forename> and <surname> to express the editor's confidence about the content of the element in question. For instance, if an entry clearly states a woman's surname and includes a mention of her husband, the husband's surname may be inferred to be the same as the wife's, but only (in the absence of any evidence that the name in question is the wife's married name) with a medium level of certainty, as it is not at all uncommon at this period for married women to be referred to by their maiden names.

There are, moreover, a few cases in which it appears that the practitioner or the querent has given a false name for one of the participants. This applies mainly to entries Napier has copied from Forman's records, in which the case details are virtually identical in all respects except the name of the querent and/or subject. There are also cases (e.g. 14425, 14473, 20540, 20541, 20615) in which the querent demonstrably gives a false name to the person he or she is asking about. In such cases, the relevant part(s) of the name should be given the @type value deckname.

In the case of Decknamen, a @resp attribute may also be applied to resolve any ambiguity as to who is responsible for the subterfuge, e.g. a @resp value of #rnapier if Napier is falsifying the name, a @resp value of #t if (as in the cases cited above) Mr Trendall is asking about a subject under a cover name, or values of # plus the @xml:id value of the person in question for self-conferred Decknamen.

There are also very occasional instances of two variant spellings of a forename or surname being explicitly presented as alternatives, as in CASE2937, for 'Richard Gore (Cuer)'. (If the name is simply spelled two different ways at different points in the entry, it should be regularised in the header, but here Forman is clearly going out of his way to point out that these two very variant spellings do indeed mean the same person.) In such cases, the second option takes the @type value variant.

Nicknames, if identifiable as such, can be dealt with by applying the @type value nickname to <forename>, <surname> or <name>.

In other cases of people being given variant designations that have nothing to do (or nothing obviously to do) with their marital status, the relevant element should also be given the @type value alternate.

If there is no evidence to the contrary, where someone is identified in the records as 'X alias Y', Y should be considered the 'alternate' name. Note that 'alias' in 16/17th century English is used much more loosely than in modern English and does not necessarily imply disguise or subterfuge, being more akin to the modern 'aka'.

<genName> contains standard formulae used to distinguish between people who otherwise have the same name, e.g.

<persName><roleName>Mrs</roleName> <forename>Elizabeth</forename> <surname>Shaw</surname> <genName>junior</genName></persName> <persName><genName>Old</genName> <forename>Anne</forename> <surname>Emerson</surname></persName>

From currently available evidence, it seems that Napier consistently uses 'old' and 'young', in formulae such as 'Old Mr Smith' or 'Young Mrs Blundell', as <genName>s rather than simple adjectives.

<nameLink> covers connecting (usually possessive) prepositions within proper names, such as 'ap', 'van', 'de'. There are frequently inconsistencies in the source as to whether or not these should be capitalised, and as to whether or not they should be regarded as part of the surname (van Dyck, Vandyck?). There are also instances of their being mutilated by anglicisation, e.g. 'Fan Hoeck'. In the material under consideration here, however, these are fairly rare, so the precise rendition can be sorted out on a case-by-case basis.

<addName> (which is used very rarely) can be applied to soubriquets (typically nominal phrases) that defy more precise classification, e.g. 'The Grand Turk' or 'Deaf Alice' (see the entry for <addName> in the Element Set).

<name> can be used in cases where it is unclear whether the name given is a forename or surname (or, indeed, nickname or anything else).

If some or all of a person's name is not explicitly stated, or is given in code, or is drastically abbreviated, but can be deduced from external evidence, <persName>, <forename> and <surname> (but not, for some reason, <name> or <genName>) can all take an @evidence value of external and, if there is any doubt about the identification, a @cert value of high, medium or low. If all or part of the name is given only in one of the untranscribed sections of the entry in question, @evidence takes the value internal.

The proper nouns in <persName> should be given in regularised forms as established on the project's wiki page.

If any of the components of <persName> is unclear or only partially legible from the available image, whatever is legible or conjecturable should still be recorded in the header, with the same tags as are applied in the text, e.g. '<persName><roleName>Mr</roleName> <forename>John</forename> <surname>V<gap extent="1" unit="chars" reason="hand"/><unclear cert="low" reason="hand">own</unclear></surname></persName>' (CASE23404). In such a case, regularisation is obviously impossible, but only completely indecipherable components of <persName> should be omitted from the header.

ii) <sex>, which may be '<sex value="1">M</sex>', '<sex value="2">F</sex>' or '<sex value="0">U</sex>' (unknown). This can also have a @cert value of high, medium or low if sex can be inferred but with less than total confidence, and an @evidence value of conjecture, external or internal to indicate the grounds on which sex has been determined or conjectured if this is not evident from the transcribed part of the entry. <sex> should always be included, even if the @value value is 0 (unknown), as this provides a means of checking that the gender really is unknown as opposed to the transcriber's having forgotten to record it. The gender even of wholly anonymous participants can quite often be deduced from personal pronouns, Latin grammar or other internal evidence.

iii) <age>, with a numerical @value value (in years by default) repeated as the content, e.g. '<age value="12">12</age>'. This can also take a @precision value of high, medium or low if the age is explicitly given as approximate, e.g. 'of about 20 yeres'. However, it can be taken as read that almost all ages are guesses or approximations (if not outright lies by the querent). Where a range is given, e.g 'of between 40 and 50 yeres', 'of more then sixty', @value can be replaced by @atLeast and/or @atMost. Where the age of small children is given in days, weeks or months rather than years, in fractions of years, or in a mixture of temporal units, a @unit value can be invoked: '<age value="6" unit="months">6 months</age>', '<age value="30" unit="months">2 and a half years</age>'. In such cases, the content of <age> should be a modernised form or English translation of whatever formulation is used in the original.

In the case of ranges, the content of <age> should be the upper and lower limits of the range (if both are available), linked by a hyphen and with no spaces. E.g., taking the hypothetical example above, '<age atLeast="40" atMost="50">40-50</age>'. Where only one limit is known, the following models can be used:

<age atLeast="61">over 60</age> <age atMost="19">under 20</age>

Or the content may use the formulae 'at least' and/or 'at most' if it seems more appropriate in natural language. This applies especially in the case of inferred ages of young children, e.g. a child is known to be either the same age as or at least nine months older than its one-month-old sibling, in which case the content may read 'at least 1 month'. (In such a case it would in principle be feasible to describe the child as either 1 month or at least 10 months old, discounting the possibility of premature birth, but it hardly seems worth it.)

@atLeast and @atMost can also be used if, for instance, the second digit of someone's age is lost or illegible. Matters become more complicated if the first digit is lost, so that '<gap unit="chars" extent="1">6' may mean 16, 26, 36, etc. up to 96 (it seems reasonable to suppose that none of the people mentioned in the casebooks lived to much more than 100, though centegenarians are not unknown). In such cases, multiple <age>s can be given, each with an @xml:id value of age1, age2 etc. and an @exclude value pointing to all the incompatible alternatives, on this model (assuming for brevity's sake that the person in question is known to be under 40):

<age xml:id="age1" value="16" exclude="#age2 #age3">16</age> <age xml:id="age2" value="26" exclude="#age1 #age3">26</age> <age xml:id="age3" value="36" exclude="#age1 #age2">36</age>

The same procedure should be adopted if two or more conflicting ages for a given person are specified in the same entry and there is no evidence as to which should be preferred or which represents the practitioner's last thought on the matter.

iv) <residence>. If anyone's address is specified (however vaguely) in the transcribed part of the entry, it should be marked up in the body text with <rs type="address" xml:id="address1"> (or @xml:id values of address1, address2, address3 etc. if more than one address is mentioned in a given case). Any terminal punctuation should be placed after the <rs> tag, though medial punctuation may be included within it. The <residence> element in <person> then takes the @sameAs value of # plus the @xml:id value of the relevant <rs>. Obviously, more than one person may live at a given address, and if there are good reasons for supposing two or more <person>s to have the same address, they can (and should) have the same <residence sameAs> value - even if this is not explicitly stated in the text of the document. E.g. if there is no evidence to the contrary, it seems reasonable to assume that children under ten probably live with their parent(s), spouses probably live together, servants probably live with their employers and vice versa. In such cases, <residence> takes an @evidence value of inferred. Where there are grounds for doubt, @cert can be invoked.

<residence> also takes, invariably, a @key value of databaseNormalizedIDKey. This is a placeholder that will be converted to something more meaningful once a proper database of all the addresses mentioned in the casebooks has been constructed.

In the very rare cases in which residence can be determined from the untranscribed part of the entry, <residence> takes an @evidence value of internal and as content a literal transcription of the address as it appears in the source (see the entry for <residence> in the Element Set for an example).

Where former places of residence are mentioned in the text, these should be marked up in the same way but with <residence> taking a @notAfter ISO value of the last plausible date at which it might have been a current residence (the day before the consultation if no other evidence is available) or a @to value of the date on which it ceased to be one if this can be ascertained.

v) <birth>. If a date (and/or time, and/or place) of birth is specified (typically, but not exclusively, in nativities), this should include <date>, <time> and/or <placeName>. <date> and <time> take @when (or @notBefore/@notAfter) ISO values. <placeName> takes a @sameAs value of # followed by the @xml:id of the relevant <rs type="address"> or <placeName> as marked up in the body text. Or if the place name does not occur in the transcribed part of the text, <placeName> takes a literal transcription of the relevant text as its content (see the example given under <death> in the Element Set: at the time of writing, this situation has never arisen in the context of <birth>).

Where two (or more) mutually exclusive dates or times of birth are given in the text, both or all should be recorded in the header, each with an @xml:id value of date1, date2 (etc.) or time1, time2 (etc.), and an @exclude value pointing to the incompatible alternative(s), on the same principle as incompatible <age>s.

The same applies if a single date is given in the text but there are two or more plausible interpretations of it (e.g. 'Friday 13 July' if the 13 July in question was a Thursday), and it is not possible to establish which option is to be preferred.

The same mechanism can be applied to ambiguous or mutually contradictory dates for <death> or <event>.

<date> may also take an @n value of Monday, Tuesday etc. if the day of the week is (correctly) specified in the text, but if not there seems little if any point in supplying this from external sources.

Where any of this data occurs in an untranscribed part of the document, either <birth> itself or any or all of its children should take an @evidence value of internal. Where a date or date range (typically of birth but sometimes of other things) has been worked out by deciphering one of Napier's characteristic circumlocutions such as 'Thursday a sennet before St Peters', <date> takes an @evidence value of "extrapolated".

In the not unusual event that the date of birth given in the entry is incompatible with the age stated in the entry, this can be mentioned in a comment tag, but the data should be left as the source presents them. If the discrepancy is wild enough we may see fit to mention it in the <notesStmt>, but since the value of <age> is generally assumed to be approximate anyway, this does not normally merit any public comment.

vi) <death>. If a date (and/or time, and/or place) of death is specified, even if it is not in the transcribed section of the document, it should be recorded using the same elements as in <birth>. If it is in an untranscribed section, <death> or any or all of its children should take an @evidence value of internal.

vii) <event>. If a date (and/or time, and/or place) is specified in the entry for significant events relevant to a participant's life history other than birth, death or anything covered by <relation> (q.v.), e.g. christening, baptism or burial, it should be encoded in an <event> element, in much the same way as <birth> and <death>, except that <event> needs a @type value explaining what sort of event it is, and must have content (contained in a nested <p>). This should consist of the bare minimum of relevant information, on this model:

'<event type="baptism"><p>Baptised on <date when="1587-01-01">1 January 1587</date> at <time when="11:00:00">11 am</time> at <placeName sameAs="#place1">St Pancras&rsquo; Church</placeName>.</p></event>'.

viii) <occupation>. Where a participant's occupation is stated in the entry, this should be recorded (in modernised form with an initial capital) as the content of <occupation>, e.g. 'Midwife', 'Baker' (see the master occupations list for occupations defined to date). Where an occupation is given in terms that seem potentially confusing to non-specialist readers (e.g. 'man' or 'boy' meaning male servant or apprentice), explanations may be added in brackets, e.g. 'Man (Servant)'.

If a person has more than one occupation, e.g. Humphrey Caucot, 'souldier & shoemaker' (CASE18164), two or more <occupation> elements may be used.

If a person's occupation is explicitly stated but only in the untranscribed portion of the document, <occupation> takes the @evidence value internal.

If a person's occupation is not explicitly stated but can be inferred from the content of the entry (whether the transcribed portion or not), for instance a reference to someone's 'master' or 'mistress' suggesting that the person in question is a servant, <occupation> takes the @evidence value inferred. If there is some doubt or dispute about the inference, it may also be given a @cert value (high, medium or low).

If someone is described as having formerly had a particular occupation, <occupation> takes a @notAfter ISO value of the last plausible date on which he or she might have ceased to have it (normally the day before the consultation unless any further evidence is available), or a @to value of the date on which he or she ceased to have it if such precise information is present in the record.

ix) <trait>. In the very rare cases where someone is defined in the original record by her or his place of origin, nationality or other regional affiliation (whether or not this is accompanied by any of the other available components of <person>), e.g 'Segnior frenchman' (CASE5497) or 'denshir woman' (CASE5388), this should be recorded in a <trait> element with the @type value regionalIdentity. <trait> in turn contains <desc>, which contains a succinct natural-language description of the regional identity, based on the source but with modernised spelling and if necessary English translation ('Frenchman' and 'Devonshire woman' respectively in the above examples).

b) <org>. Where any of the parties to a consultation is not an individual person but a formally constituted group of unspecified people, e.g. 'the docters' (the London College of Physicians) or 'the University of Oxford', such a group - or such groups, if there are more than one - should be noted in one or more <org> elements. Each <org> takes a mandatory 3-letter @xml:id value, and as content the element <orgName>, containing a modernised version of the original description (with an initial capital), e.g. '<org xml:id="drs"><orgName>The doctors</orgName></org>'.

The @xml:id value for 'the docters' (or words to that effect, in Forman's records normally meaning the College of Physicians of London or its representatives) is drs. @xml:id values for other such groups will be assigned by the editors as they arise. (See the 'orgs and personGrps' page on the project wiki.)

A ship may be defined as an <org>, with the @type value ship, in the (fairly rare) cases where the subject or object of a consultation appears to be the ship itself rather than (or as well as) any of the people on board it.

In other respects, <org> functions exactly like <person> except that it does not contain <sex>, and its @xml:id value can similarly be used to derive @ref values for <cb:querent> (though not in the case of ships, obviously), <cb:subject> and <cb:object>, or the @active, @passive or @mutual values of <relation>.

Where any of the participants is not an individual but a group of unnamed people not definable as an <org>, e.g. 'the servants' or 'her friends', they can be defined as a <personGrp> taking an xml:id value in the same way as an <org> and as content the elements <sex> (male, female or unknown), and if relevant (i.e. if all members of the group have the same job) <occupation>. See the entry for <personGrp> in the Element Set for examples. Like <org>, <personGrp> can be pointed to by the @ref value of <cb:querent>, <cb:subject> or <cb:object>, or by the @active, @passive or @mutual value of <relation>. (See the 'orgs and personGrps' page on the project wiki for xml:id values defined to date.)

c) <listRelation> (optional). Where relationships of any sort between persons and/or groups mentioned in <listPerson> can be identified, these are recorded in one or more <relation> elements in <listRelation>, using the @xml:id values defined in <listPerson> preceded by #, thus:

<relation name="master" active="#person1" passive="#person2"/> means person1 is person2's master <relation name="mother" active="#person1" passive="#person2 #person3"/> means person1 is mother of both person2 and person3 <relation name="friend" mutual="#person1 #person2"/> means person1 and person2 are friends <relation name="member" active="#person1" passive="#drs"/> means person1 is a member of the College of Physicians of London

Where a relationship is not explicitly stated in the original document but has been deduced by the editor on whatever grounds, <relation> takes a @type value of editorial, an @evidence value of external, internal, inferred or conjecture, and if necessary a @cert value of high, medium or low (not needed if there is no reasonable doubt about the deduction).

The attributes @active and @passive are purely notional and do not imply that one person is more responsible than the other(s) for the nature of the relationship: '<relation name="slave" active="#person1" passive="#person2"/>' is perfectly valid coding to indicate that person1 is person2's slave.

If two people are described as lovers/sweethearts/boy or girlfriends, they can be linked by e.g. '<relation name="romanticPartner" mutual="#person1 #person2"/>'. However, if it appears that person1 is person2's sweetheart but person2 is not, or at least not necessarily, person1's sweetheart, the @name value wouldBeSweetheart should be applied, with the unrequited lover as the @active participant, and <relation> taking a @type value of editorial, since this is a matter of editorial interpretation rather than straight representation of the source data: '<relation name="wouldBeSweetheart" active="#person2" passive="#person1" type="editorial"/>'. In ambiguous cases - for instance, if it is clear that X has an affection for Y but unclear whether the feeling is reciprocated - a <!-- REVISIT --> comment tag should be added explaining the situation, in the hope that some as yet untranscribed entry will shed light on Y's feelings, in which case it will be possible to add an @evidence value of external to the earlier entry and (if necessary) revise the @name value of <relation>.

If the dates at which relationships began and/or ended are specified in the entry, <relation> may take the attributes @from, @to, @notBefore and/or @notAfter. @from is most likely to apply to marriage where a wedding date is given, but may also be used in defining professional, romantic or any other relationships if the evidence is available. @notAfter is particularly useful for ex-relationships, e.g. 'Jeane wit that was mrs blages maid' (CASE10082), for which a @notAfter value of the day before the consultation can be given if no more precise information is available.

It can be taken as read that if the death date of one of the partners in a relationship is recorded, this constitutes the relationship's @notAfter date, so it does not need to be specified in <relation>.

If the time as well as the date of the commencement (or, theoretically, the termination) of a relationship is specified in the source (this applies almost exclusively to marriages where the date and time of the wedding are given), @from-custom and/or @to-custom should be used. For instance,

'<relation name="husband" active="#xx" passive="#yy" from-custom="1555-07-01T12:00:00"/>'

indicates that XX and YY are recorded as having married at noon on 1 July 1555.

On the mercifully rare occasions where a range of plausible dates can be postulated for the beginning (or end) of a relationship, a series of <relation> elements each with a different @from or @to value can be used, with @xml:id values of relation1, relation2 etc, and @exclude values, as with <age>.

Any number of <relation> elements may be used.

See the relations master list for relationships defined to date.

4) <textClass> (mandatory: defines the type of text in question). This contains only the element <catRef>, which points by means of its @target value to one of the types defined in Types of text, preceded immediately by a hash character. <catRef> may also take a @cert value of high, medium or low in the very rare instances in which there is some doubt about the categorisation of the text.

5) <settingDesc> (mandatory: details about the consultation itself).

a) <date>, which nests in <p>. @when gives the date of the consultation (if known, which it usually is) in ISO format (yyyy-mm-dd) and @n gives the day of the week in natural language. Except in <publicationStmt> and <revisionDesc>, all dates should be given according to the Julian calendar (which is how they are given in the source texts).

If, as quite often happens, there is a discrepancy between the day of the week and the date as recorded in the source (e.g. a date of 'Feb 1 Wednesday' when Feb. 1 of the year in question was a Tuesday), two options are available. If there are compelling grounds for deducing which of the components is a mistake (it should be corrected as 'Feb 2 Wednesday' rather than 'Feb 1 Tuesday', or vice versa), the corrected version should be recorded in the metadata and the relevant parts of the transcribed text marked up using <sic>/<corr> as explained in the Normalisation section of the Guidelines. If both options seem plausible, both should be given, using @xml:id and @exclude values on this model:

<p><date when="1601-02-01" n="Sunday" xml:id="date1" exclude="#date2"/></p> <p><date when="1601-02-02" n="Monday" xml:id="date2" exclude="#date1"/></p>

If both options are plausible but one is more plausible than the other, the two alternative <date>s can take @cert values of high and low respectively. In such cases, the offending part of the source should be corrected using <sic>/<corr> with a @cert value of high on <corr>. If both options seem equally plausible, no @cert values need be applied in the header, and the source text should simply be given as it stands.

b) <time>, which nests in another <p>. @when gives the time of the consultation (if known) in ISO format (hh:mm:ss).

In both cases, @notBefore and/or @notAfter can be used instead of @when to offer a time range if the exact date or time is not known, and @cert can be applied in doubtful cases. If part of the time is lost or illegible, resulting in a number of mutually exclusive alternatives, @xml:id and @exclude values (of time1, time2, etc.) should be used in the same way as for <age> and <date>.

If no date or time is given, but it can be deduced with reasonable confidence (typically on the basis of manuscript position and/or internal evidence) that a consultation took place between time A on date X and time B on date Y, @notBefore-custom and @notAfter-custom values should be used. E.g.

'<date notBefore-custom="1599-01-08T11:00:00" notAfter-custom="1599-01-09T07:20:00"/>'

means some time between 11 am on 8 January 1599 and 7.20 am on 9 January 1599. In such cases, if a range has been systematically inferred, for instance on the assumption that the entry was written after the previous one and before the next one in the manuscript (bearing in mind Napier's habit of filling in his pages in an inconsistent and often counter-intuitive order), or that an interrogation was obviously written after the event it refers to and presumably before the next entry in the manuscript, <date> takes an @evidence value of "inferred".

It is permissible for @notBefore and @notAfter-custom or @notBefore-custom and @notAfter values to co-exist within a given <date> if one end of the range can be determined with more precision than the other.

If no evidence at all is available as to the time of a consultation, <time> can be omitted from the header. <date>, however, should always be included, either as a specific date or as a range with specific lower and upper limits, even if it can be narrowed down to nothing more precise than a range of 'some point between the date when the consultant began practising and the date of the consultant's death'.

Dates and times should always be given in full (yyyy-mm-dd/hh:mm:ss) even if no precise date or time can be established. In such cases, a range should be offered, e.g. '<date notBefore="1598-01-01" notAfter="1598-12-31"/>' rather than simply '<date when="1598"/>'.

c) <cb:consultation> (mandatory), which contains as many as applicable of the following:

i) <cb:questionNumber> (optional), with a numerical @n value stating that this is explicitly recorded (whether in the transcribed or the untranscribed portion of the entry) as the querent's first, second or whatever consultation (e.g. <cb:questionNumber n="2" /> where the text contains a formula such as '2a questo' or '2a figura'). This applies even if (for instance) '2a questo' or '2a figura' is appended to what is in fact the querent's fifteenth recorded question: this formulation clearly had a special meaning for Forman and Napier, indicating a sequence of connected questions, though we have not yet categorically established the nature and purpose of these groupings.

ii) <cb:consultant> (mandatory), with a @ref value of the consultant's @xml:id value as defined in <listPerson>, preceded by #.

If necessary (i.e. if there is more than one practitioner), there may be more than one @ref value, e.g <cb:consultant ref="#sforman #rnapier" />. The same applies to <cb:querent>, <cb:subject> and <cb:object>.

It is also permissible to include more than one <cb:querent> and/or <cb:subject> if they need different @present and/or @evidence values, though if both or all have the same values it is simpler to subsume them into a single @ref as above.

iii) <cb:practice> (mandatory): the practice in which the consultation took place, with a @name value of either forman or napier. In the vast majority of cases, entries in either practitioner's casebooks refer to his own practice (even if someone else is standing in for him as practitioner), but there are a very few examples of cases in Forman's practice being recorded in Napier's casebooks or vice versa, either because one of them is standing in for or collaborating with the other, or because Napier has copied a record from Forman's practice into his own casebook.

iv) <cb:querent> (mandatory): the person(s) who asked the question. This takes a @ref value of the querent's @xml:id value as defined in <listPerson>, preceded by #. (This may be the same person as the practitioner.) Also a @present value of yes, no or unclear to say, if possible, whether the question was put in person, unless the querent is or is assumed to be the same person as the subject, in which case @present is only required on <cb:subject>.

In most cases, the querent is assumed to be the same person as the subject if there is no clear statement of her/his identity. In these cases no further attributes are needed. But where there is an explicit statement, however vague, of the querent's identity, e.g. 'her self' (querent=subject) or 'the mother for the child' (querent=mother), @evidence should be internal. If the querent's identity can deduced from some external source such as another case or Forman's guide to astrology, @evidence should have the value external. If we can infer that the querent is not the same person as the subject but we have no clear evidence about who he/she is, <cb:querent ref="#anon" evidence="inferred" /> (where the subject, obviously, is someone other than anon). Where there is significant doubt about the querent's identity, the element can take a @cert value of high, medium or low. NB this is certainty about whether the person in question really is the querent, not whether he or she was present.

The formulation 'himself/herself/the woman (etc.) present' or 'in presenc' (as Napier usually spells it) should be treated with caution. While 'she sent', 'himself came' and the like seem fairly unambiguous indications that querent=subject, a mere statement of presence may very well suggest that the subject was not present of her or his own volition. In any case, it does not constitute solid evidence that the person in question is the querent. In such cases, if querent is assumed to be subject since there is no positive evidence to the contrary, <cb:subject> should of course take a @present value of yes but <cb:querent> should not take any @evidence value.

The main exception to the default querent=subject assumption is children under ten. It is assumed that someone else is asking the question on their behalf unless they are expressly described as asking it themselves. By the same token, if a subject who is not the same person as the querent is ten or over, he/she is assumed not to be present unless otherwise stated, but under-tens seem likelier to have been brought along to their consultations and should be given a @present value of unclear if there is no explicit statement on the issue.

There are occasional cases in which it is apparent that a child of ten or more has been accompanied by an unspecified querent, as in CASE3118, for 12-year-old Margery Heath, where 'the child present' can hardly be taken as meaning that Margery put the question herself, but no other querent is recorded.

Where the subject is known to be under 10 and no querent is specified, the presumed querent should be given the @xml:id value anon, as there is a very considerable chance that he or she is not the child's birth parent, if he or she exists at all.

The other exception to this rule of thumb is where subjects are manifestly deranged or incapacitated, or the entry features such derogatory accounts of their behaviour, which are clearly not the practitioner's own comments, as to make it barely conceivable that they could themselves be the source of the information (or allegation). In such cases, the querent should be treated as anonymous, with an @evidence value of conjecture and a @cert value if appropriate.

If there is reasonable doubt as to which of the people recorded in <listPerson> is the querent, both or all the mutually exclusive candidates should be listed, each in a separate <cb:querent> element, with an @xml:id value of quer1, quer2 (etc), a @ref value pointing to the @xml:id value of the relevant person as declared in <listPerson>, and an @exclude value or values pointing to the @xml:id value(s) of the incompatible alternative(s), on this model:

<cb:querent xml:id="quer1" ref="#ab" exclude="#quer2"/> <cb:querent xml:id="quer2" ref="#cd" exclude="#quer1"/>

These alternatives may also be weighted by adding different @cert values to the different <cb:querent> elements if both or all options seem plausible but one seems more plausible than the other(s). If, however (as is almost always the case), no option seems likelier than the other(s), there is no need for a @cert value on any of them.

v) <cb:subject> (mandatory): the person(s) about whom or in whose interests the question was asked, with the same attributes as <cb:querent>. If subject and querent are or are assumed to be the same person, only <cb:subject> needs a @present value.

Just as with <cb:querent>, if there is reasonable doubt as to which of the people recorded in <listPerson> is the subject, both or all the mutually exclusive candidates should be listed, each in a separate <cb:subject> element, with an @xml:id value of subj1, subj2 (etc), a @ref value pointing to the @xml:id value of the relevant person as declared in <listPerson>, and an @exclude value or values pointing to the @xml:id value(s) of the incompatible alternative(s), on this model:

<cb:subject xml:id="subj1" ref="#ab" exclude="#subj2"/> <cb:subject xml:id="subj2" ref="#cd" exclude="#subj1"/>

Again as with <cb:querent>, @cert values may be applied to each <cb:subject> element to establish a hierarchy of plausibility, but no @cert value is needed if neither or none of the candidates seems significantly likelier than the other(s).

In this extremely rare situation (at the time of writing, the situation has arisen only twice in over 34,000 entries), <cb:topic> must contain <person> elements for both or all the mutually exclusive candidates, each with an @xml:id value of pers1, pers2 (etc.), an @exclude value or values pointing to the @xml:id value(s) of the incompatible alternative(s) as stated within <cb:topic>, and a @sameAs value pointing to the @xml:id value of the person in question as declared within <listPerson>.

Thus, supposing that an entry has the topic lossAndTheft and that this applies either to someone called Andrew Burton or to someone called Catherine Davis (but not to both):

<listPerson> will include

<person xml:id="ab"><persName><forename>Andrew</forename> <surname>Burton</surname></persName></person> and <person xml:id="cd"><persName><forename>Catherine</forename> <surname>Davis</surname></persName></person>

The two possible subjects will be listed within <cb:consultation> as

<cb:subject xml:id="subj1" exclude="#subj2" ref="#ab"/> <cb:subject xml:id="subj2" exclude="#subj1" ref="#cd"/>

and the <cb:topic> element within <cb:consultation> will appear as

<cb:topic key="lossAndTheft"><person xml:id="pers1" exclude="#pers2" sameAs="#ab"/><person xml:id="pers2" exclude="#pers1" sameAs="#cd"/></cb:topic>

If the subject is not, or is assumed not to be, the same person as the querent, this element can optionally take the @consent and/or @knowledge values yes or no if there is an explicit statement that the consultation occurred with or without the subject's consent and/or knowledge. Alternatively, @consent may take the value explicitlyUnclear if the practitioner actually expresses uncertainty as to whether the subject consented or not. (Consent, or the lack of it, is normally expressed in the records by the stock Latin phrases 'cum consensu' or 'sine consensu', or very occasionally some such formulation as 'credo sine consensu', which calls for an explicitlyUnclear value.) If there is no explicit statement one way or the other, on either count, these attributes should not be applied.

Napier has a bewildering habit of describing consultations as having occurred with or without the subject's consent when the subject is manifestly too young (e.g. 8 months old) to have expressed an opinion on the matter. Our current working hypothesis is that this in fact refers to the child's presence rather than her or his consent, so it can be used to infer a @present value. However, a @consent value should also be entered, in the interests of remaining faithful to the source, however implausible it may seem.

vi) <cb:messenger> (optional): if a messenger is specified, the @ref attribute points to the @xml:id value of the person in question as declared in <listPerson>.

This includes people such as servants or relatives who have brought a question. However, it is often hard to be sure whether the bringer of a question counts as querent or messenger (i.e. whether they are putting the question themselves or simply conveying it). In such cases, comment tags, editorial consultation, and as a last resort <notesStmt> are the best tools we currently have available.

vii) <cb:object> (optional): the person(s) who is/are not the principal person(s) in whose interests the question was asked but is/are highly relevant to it: e.g. if X asks whether she should marry Y, X is querent and subject, Y is object, or if A asks whether B has bewitched C, A is querent, C is subject and B is object. Or if D asks about the whereabouts or wellbeing of her absent husband E, D is both querent and subject (since it is in her own interests that she is asking) and E is the object. Again, the @ref value points to the @xml:id value of the person in question. As with querent/messenger, the borderline between subject/object is not always easy to define.

In the unlikely event of there being two or more mutually exclusive candidates for the role of <cb:object> (as distinct from there simply being two or more <cb:object>s, which is not at all uncommon), the mechanism described above under <cb:querent> and <cb:subject> should be invoked, with @xml:id values of obj1, obj2 (etc.) on the relevant <cb:object> elements.

viii) <cb:location> (optional) with @type values of neutralGround, practitionerReceivesMessage, practitionerVisitsQuerent, querentVisitsPractitioner or unknown. practitionerVisitsQuerent and querentVisitsPractitioner are hopefully self-explanatory; the others (which are used only rarely) are explained in more detail in the entry for cb:location in the Element Set. Note in particular that practitionerReceivesMessage should only be used in conjunction with ref (concerning which see below), where there is an explicit statement of where the practitioner was when he received the message. A mere statement that a message was sent, with no indication of where it was sent to, does not call for the use of practitionerReceivesMessage, as this is already apparent from the no value for @present on <cb:querent>.

<cb:location> also has an optional @ref value of e.g. #place1 or #address1, pointing to the @xml:id value of a <placeName> or <rs type="address"> in the body text (cf. <residence> above) if the location of the consultation can be more precisely specified with any confidence.

The great majority of Forman's and Napier's consultations almost certainly took place at their respective homes, but it should not simply be assumed that this is the case if there is no statement to that effect. Even a formulation such as 'he came to me' or 'she visited me' does not necessarily mean that the encounter was in the consultant's home, and indeed there are some cases where we know that Napier was 'visited' while he was staying at his brother's house in Luton or with a friend in some other town.

ix) <cb:topic> (mandatory), with a @key value of any one of the categories defined in the topics master list. There may be any number of topics bar zero. Each <cb:topic> element must contain at least one <person>, <personGrp> and/or <org> element, pointing to the person or people being asked about, thus: '<cb:topic key="XXX"><person sameAs="#yy"/></cb:topic>', '<cb:topic key="XXX"><person sameAs="#aa"/><person sameAs="#bb"/><person sameAs="#cc"/></cb:topic>'. In most cases, this will be the subject, or one or more of the subjects, but the element may also, or instead, contain one or more objects. Within topics such as lossAndTheft or witchcraft, both the victim(s) and the alleged perpetrator(s), if any are mentioned, should be included here: the question of whether they should be regarded as victim(s) or perpetrator(s) will be determined by whether they have been defined as subject(s) or object(s).

There are some cases, most notably absentPerson entries, in which the person asking is the subject but the person being asked about is the object. In these instances, only the object(s) should be included as <person>s within <cb:topic>.

To summarise:

If A is asking about the welfare or whereabouts of absent person B, only B belongs in <cb:topic>, since A is not the person being asked about, although he or she counts as the subject. If A is asking whether B has robbed or bewitched C, both B and C belong in <cb:topic>, but A does not, since A is not the robbed or bewitched party, and not the supposed robber or bewitcher. If, however, A is asking whether B has robbed or bewitched A, both A and B should be <person>s within <cb:topic>.

Where there are two or more mutually exclusive candidates for the @key value of <cb:topic>, they can be dealt with by entering two or more <cb:topic> elements with @xml:id values of topic1, topic2 etc., and @exclude values, as with <age>, <date> and <time>.

Certain topics should only be assigned to an entry if the practitioner himself has specified them in the question section using a particular constrained vocabulary. In these cases, <cb:topic> should take a @resp value pointing to the practitioner in question, to indicate that the choice of topic is based directly on the practitioner's own categorisation rather than the editor's interpretation. The @key values in question are:

diz (where the practitioner has used the term 'diz', 'disease', 'diseased' or cognates in a medical question) morbus (where the word 'morbus', however declined, features in the question section of a medical entry) morbusPassions (where the word 'morbus', however declined, features in the question section of a passions entry) nonMedicalState (where the word 'stat', 'state' or 'status' features in the question section of a non-medical entry) passionsDiz (where the practitioner has used the term 'diz', 'disease', 'diseased' or cognates in a passions question) sickness (where the word 'sick' or 'sickness' features in the question section of a medical entry) sicknessPassions (where the word 'sick' or 'sickness' features in the question section of a passions entry) state (where the word 'stat', 'state' or 'status' features in the question section of a medical entry)

In the case of <cb:topic key="liveOrDieMedical"> (as opposed to plain liveOrDie, which is for questions that are not, or at least not obviously, medical in nature), an @evidence value of internal should be applied if either the fact that the question is about whether someone will live or die or the fact that the question is medical in nature is only apparent from the untranscribed section.

If the topic cannot be even conjecturally defined (because it has been lost through damage, it is unstated, it is stated so cryptically that none of the editors can make sense of it, or it has been deleted to the point of indecipherability), <cb:topic> takes a @key value of unknown and a @reason value of msDamage, notGiven, notUnderstood or deleted. If a question concerns more than one topic, it is permissible for one or more to have the @key value unknown even if the other(s) can be more precisely defined.

x) <cb:item> (optional) with a @type value of urine, letter, token or blood. Normally this will only feature if urine or blood samples, a letter, or some other unspecified token is explicitly mentioned in the record, but it may take the @present values explicitlyNo (if the consultant actually notes the absence of such an item) or unclear (if there is some reason to suppose the item was probably present but no conclusive evidence).

xi) <cb:judgment> (optional): included if a judgment section is given.

xii) <cb:treatment> (optional): included if a specific treatment is recorded, or if there is an explicit statement that no treatment was offered.

xiii) <cb:recipe> (optional): included if a specific recipe is spelled out, normally (but not always) preceded by the glyph '&Rx;'. The distinction between 'treatment' and 'recipe' is somewhat subjective and may have to be sorted out on a case-by-case basis, but as a general guide 'prepare purg and bleed' is a treatment, '&Rx; Iuniper berries steeped in dew' is a recipe. The two are not, of course, mutually exclusive.

xiv) <cb:info> (optional), with a @type value of event, financial, urine or angel. This records the presence of information about any of these four apparently discrete matters.

<cb:info type="event"/> is used if information has been added about previous or subsequent events relevant or potentially relevant to the case: 'her former physick wrought not at all', 'he died 3 dais after' or the like. It takes the mandatory @subtype value previousConsultation or subsequentEvent. Information about previous events should be limited to remarks about previous consultations or treatments administered by the current practitioner. Details about past events in the patient's private life or treatments offered by other practitioners should be considered part of the judgment. Reports of subsequent events (which, by definition, constitute addenda to the original records) can be construed more broadly, including subsequent treatments not prescribed by the practitioner (if they are prescribed by the practitioner they count as <cb:treatment>), or reports of the subsequent effects of treatment prescribed by the practitioner, e.g. '24 stooles it gave her' (CASE10635).

<cb:info type="financial"/> applies if there is any explicit record of payment or other financial details relating to the consultation. In Napier's case, this quite often includes a statement that no charge was made, usually expressed as 'gratis' or simply 'gr'. There are also many instances in Napier's records of payments-in-kind, usually involving agricultural produce ('shee brought me apples', 'he sent me a paire of Conyes' etc.). In both cases, these should be recorded as <cb:info type="financial" /> with a comment tag briefly transcribing or summarising the information.

Distinguishing financial information can be problematic, thanks to Napier's habit of stating the weight as well as the price of medicines in pence. Generally speaking, a quantity expressed at least partly in pounds and/or shillings can be taken as <cb:info type="financial" /> but one expressed only in pence cannot, unless it is preceded by some other indication of the weight of the medicine in question (which may also be in pence). The term 'ob.', which is frequently appended to quantities given in pence, is short for 'obolus', meaning 'halfpenny', and may refer to either weight or price.

<cb:info type="urine"/> is included if there is a description of the patient's urine (as opposed to merely a record of its having been brought or sent), suggesting that urine observation is being used as a diagnostic tool. In a fairly small number of cases, it is a matter of editorial judgment whether an account of the patient's urine should be considered symptom description or analysis, but to take two relatively straightforward examples, 'his urine scalding hot' is symptom description but 'water too high coloured' is analysis.

In April 1611, Napier took to consulting the archangels Asariel, Gabriel, Michael, Raphael and Uriel, and a somewhat lower-ranking angel called Aladiah (and possibly others) for advice about diagnoses, prognoses and treatments. Quite how he communicated with them remains something of a mystery but his records of their advice are clearly of considerable interest and the presence of such records in a given entry should be documented using <cb:info type="angel"/>, with a mandatory @subtype value of Aladiah, Asariel, Gabriel, Michael, Raphael, Uriel or Unspecified as the case may be.

These records tend to be extremely cryptic and can be hard to spot, but they are typically much more categorical than Napier's own determinations. Whereas Napier usually hedges his own judgments with formulae such as 'it seemeth that ...', 'videtur ...' or the like, those of the angels are presented with an air of confident finality, as statements of concrete fact: 'she will die', 'non est gravida' etc. (though in the case of favourable prognoses often with a proviso such as 'by the grace of god', 'with Gods help' etc). They are normally (but not always) prefaced by the capitalised initial of the archangel in question (A for Asariel, G for Gabriel, M for Michael, R for Raphael or V for Uriel) or a truncated form of his name (e.g. 'Gab' for Gabriel or 'Mich' for Michael). Current thinking is that Aladiah is normally given in truncated form as 'Al' (or in one instance spelled out in full as 'Alladia', which is how we identified him), but where only a letter 'A' is given it is advisable to add a comment tag of the form <--! TODO CB ANGEL --> pointing out that it may mean Aladiah instead of Asariel.

All the customised <cb:xxxx> elements listed above may take @cert and/or @evidence values if there are grounds for doubt about whether the element should be included or, in the case of <cb:topic>, what the @key value should be.

The presence of astrological and/or geomantic charts is noted in the <body> section of the document (see Content Tagging, section 5) and does not need to be recorded in the header.

Where any sort of information that would normally be noted in the metadata, such as judgment, treatment, urine note, financial information or event information, is present in the record but has been wholly deleted, it should not be recorded in the header, but can (and preferably should) be mentioned in a comment tag. Where such information is only partially deleted, however, the remaining section still needs recording.

III. <revisionDesc>

records the work done on the electronic file, and consists of an indefinite series of <change> elements each with an ISO @when value giving the date, and content giving a natural-language account of each significant revision. The person who made it goes in a <name> element with an @xml:id value that will be assigned by the senior editors when someone starts work, but is normally based on that person's first initial plus her or his surname, e.g. mhawkins, lkassell, rralley or jyoung for Michael Hawkins, Lauren Kassell, Robert Ralley or John Young.

Proofreading a file counts as a <change> even if the proofreader has not in fact made any changes to it. The file itself may not have changed but its status has.

If the @xml:id value of any of the people working on the document has previously been declared, subsequent occurrences of her or his <name> take a @sameAs value of # followed by the relevant string.

For instance:

<revisionDesc> <change when="2009-12-12">Data entered in Microsoft Excel spreadsheet by <name xml:id="rralley">Robert Ralley</name> as part of the Casebooks Pilot Project.</change> <change when="2010-07-05">XML file created by <name xml:id="mhawkins">Michael Hawkins</name> from Casebooks Pilot Project data.</change> <change when="2010-07-09">Transcribed by <name xml:id="jyoung">John Young</name>.</change> <change when="2010-08-13">Checked by <name sameAs="#rralley">Robert Ralley</name>.</change> </revisionDesc>

The <TEIHeader> is followed immediately by a <facsimile> element containing one or more <graphic> element(s) which point, by means of the @url value, to the filename(s) of the image(s) on which the transcription is based. This information is entered by the technical director before transcription begins (or, in the case of legacy files based on 'home-made' images, supplied by him when canonical images become available). The editors need only concern themselves with it if an entry runs to more than one page, in which case they need to supply one or more further <graphic> element(s) as explained in the entry for <graphic> in the Element Set, or in the very unlikely event of there being a mismatch between the @url value of <graphic> and the actual URL of the relevant image. These values will already have been vetted by both the technical director and a senior editor, but editors are encouraged to double-check them (this may sound horrendously complicated but in practice any such mismatch should be glaringly obvious). If something does seem to be wrong, editors should inform the technical director and not attempt to correct it themselves.


Where it has been deemed appropriate to clarify original spelling or terminology using modernised or standardised forms, the usual mechanism is to use <orig> and <reg> tags within <choice>. Normally, the content of <orig> will appear in the diplomatic view and the content of <reg> in the normalised. However, there is the additional option of <reg type="gloss">, offering a succinct editorial explanation of any text that even specialist users may need help with, such as the use of planet symbols to mean days of the week. This will appear as a mouseover in the diplomatic version. Text may have both a normalisation and a gloss: for instance the 'p' in '45 p 6', occurring within an English passage, would almost certainly have been written as 'post' had Forman not abbreviated it, but what it means to a modern reader is 'past'. This is encoded as '<choice><orig>p</orig><reg>post</reg><reg type="gloss">past</reg></choice>', which is displayed in the diplomatic view as 'p' and in the normalised as 'post', in both cases with a mouseover reading 'past'. This section (hopefully) establishes the protocols for what to normalise and how.

1) In general, semantically insignificant distinctions between letter forms can be disregarded: thus, short and long 's', Greek and Roman 'e', medial and terminal 'f', and so forth, need not be differentiated.

However, some letter forms are of sufficient intrinsic interest to warrant distinct encoding. The letter 'thorn', used as an abbreviation of 'th' but written exactly or almost exactly like 'y', should be encoded as the entity &thorn;. This provides the option of expanding it to 'th' in the normalised view and presenting it either as 'y' or as the Unicode thorn character ('þ') in diplomatic. If a word begins 'ff' (functioning like a capital 'F', as still happens in names such as 'ffion'), this should be encoded as the entity &ff;.

2) Unless a scribal hand does distinguish between upper case I/J and/or U/V, these upper case forms should be treated as I and V but provided, if necessary, with a regularisation, thus: '<choice><orig>I</orig><reg>J</reg></choice>ones', 'his <choice><orig>V</orig><reg>U</reg></choice>nkle'. These two <choice> strings can be rendered by the entities &IConsonant; and &VVowel; respectively. Where 'I' and 'V' do equate to modern 'I' and 'V', there is obviously no need to regularise.

Lower case i/j and u/v are more complicated since all four can represent either vowels or consonants. These should, where appropriate, be regularised using the entities &jVowel; for '<choice><orig>j</orig><reg>i</reg></choice>' (i.e. 'j' being used as a vowel), and &iConsonant;, &vVowel; and &uConsonant; on the same principle.

Roman numerals, whether lower or upper case, can generally be left as they stand unless they are combined with Arabic numerals in a single number, e.g. 'the 3i March', which should be normalised to 'the 3<choice><orig>i</orig><reg>1</reg></choice> March'. However, where lower-case 'j' is used to mean '1', either as a numeral in its own right or as the last part of a longer Roman numeral, it should be normalised as '<choice><orig>j</orig><reg>i</reg></choice>' (or for brevity's sake as &jVowel;, despite its not actually being a vowel in this instance).

Initial 'UU' and 'VV' should be encoded as '<choice><orig>UU</orig><reg>W</reg></choice>' or '<choice><orig>VV</orig><reg>W</reg></choice>'.

Very few early modern hands distinguish between the 'ae' ('æ') and 'oe' ('œ') ligatures, so these should always be transcribed as &aelig; unless the scribe in question clearly does differentiate between them, in which case the 'oe' ligature should be transcribed as &oelig;.

3) Capitalisation should not be imposed on or removed from the source text. A sentence or proper noun beginning in lower case should be transcribed as such, without any regularisation being offered. Similarly, a word occurring mid-sentence with an initial capital (e.g. 'flushing heates wind Rising in the stomak') should not be given a regularised form. However, it is often very hard to say whether or not an initial letter is capitalised, or whether the author/scribe himself would have been able to say whether it was. In such cases, the choice of upper or lower case is left to the transcriber's/editor's judgment, which may be informed by the context: proper nouns, and words at the beginning of sentences, are more likely to be considered capitalised; conjunctions and prepositions not at the beginning of sentences are less likely to be considered capitalised – but it is impossible to give a hard and fast ruling on this.

4) Standard types of abbreviation and shorthand should be provided with regularisations using <orig>/<reg>: '<choice><orig>hims&flourish;</orig><reg>himself</reg></choice>'; '<choice><orig>&crossedp;</orig><reg>pre</reg></choice>pare'.

a) In general, words should only be expanded if there is some form of brevigraph, overlining or other explicit scribal indication that an abbreviation is intended (such as the use of superscript in 'Bp' for 'Bishop'). A full stop after a truncated word, however, should not be regarded as an abbreviation indicator, since the use of full stops is so inconsistent and ambivalent that it would be rash to ascribe any semantic value to them.

This applies to proper nouns just as much as to any other words. 'Willm' should be expanded as 'William' ('<choice><orig>W<hi rend="overline">illm</hi></orig><reg>William</reg></choice>'), but 'Eliz' or 'Eliz.' stays 'Eliz' or 'Eliz.'. Personal names are being expanded and regularised in the header according to fairly well-defined conventions recorded on the project's wiki so there is no need to replicate this in the body text unless there is explicit indication of abbreviation.

A very common brevigraph is the overlining of a vowel or of the letters 'm', 'n' or 'y' to indicate a following 'm' or 'n'. This can be rendered e.g. 'mel<choice><orig><hi rend="overline">a</hi></orig><reg>an</reg></choice>colyk': it is up to the transcriber to deduce whether the omitted letter is 'm' or 'n' (this is usually self-evident but occasionally ambiguous). The string '<hi rend="overline">a</hi>' can be encoded as &aover; (or &eover;, &iover; etc). (This sort of overlining is often referred to as a 'macron' but this is technically wrong: a macron is a stress mark, not a brevigraph, although it looks pretty much identical.) Note that the entity &aover; does not stand in for either 'am' or 'an' but merely provides the content for <orig>: the expansion still needs to be spelled out in <reg>.

The in-house entity set covers a large number of similar standard brevigraphs such as a q-followed-by-a-tail () to mean 'que' (&que;), a character like a 9 () meaning either 'us' at the end of a word or 'con' at the beginning of one (&uscon;). Both practitioners use a loop-like character representing 'es' at the end of an English word or (much less frequently) 'is' at the end of a Latin one, e.g. 'heat' for 'heates'. This should be transcribed 'heat<choice><orig>&loop;</orig><reg>es</reg></choice>'. This choice string can be rendered by the entity &pluralLoop; provided the loop does mean 'es' (even if it does not in fact represent a plural); if it is used to mean 'is' (or anything else) the choice string needs to be coded in full (though the character itself can still be encoded as &loop;).

Forman also uses an upstroke at the end of a word, sometimes continuing leftward over the top of at least part of the word, as a sort of all-purpose abbreviation mark: this should be transcribed as &flourish; and expanded as appropriate using <orig>/<reg>. More often than not, however, the &flourish; character appears merely to be decoration rather than an abbreviation mark: these should be recorded but cannot be explicated (since they appear not to mean anything).

There is also a brevigraph very similar to a flourish which runs backward over the top of the preceding letter and sometimes terminates in what might be construed as a superscript 'r'. It means either 're' or 'er', and should be transcribed as &loopedr;, with a normalisation (using <orig>/<reg>) of either 're' or 'er', at the editor/transcriber's discretion. However, if 're' or 'er' is abbreviated as superscript 'r' with no loop, it should be transcribed as '<choice><orig><hi rend="superscript">r</hi></orig><reg>re</reg></choice>' or '<choice><orig><hi rend="superscript">r</hi></orig><reg>er</reg></choice>'.

The precise distinctions between 'superscript r', 'looped r', 'flourish' and 'superscript r that just happens to be attached to the preceding letter because the writer couldn't be bothered to take his pen off the page' can be (and have been) debated at considerable length but are not among this project's primary concerns: the important thing is the content of <reg> (i.e. the intended reading).

It is important to bear in mind that some entities represent fully expanded code strings, such as '&que;' meaning '<choice><orig>q&tail;</orig><reg>que</reg></choice>' or '&jVowel;' meaning '<choice><orig>j</orig><reg>i</reg></choice>', while others merely represent the content of the <orig> section of <choice>, such as '&aover;' and '&crossedp;'. The full expansion of each entity is clearly spelled out in the in-house entity set.

b) There are, however, some conventional abbreviations not explicitly flagged as such that occur with such frequency (and might look so suspiciously like transcriptional or typographical errors) that they seem worth regularising. Examples are:

'wth' (rather than 'wth') for 'with' ('<choice><orig>wth</orig><reg>with</reg></choice>') 'wch' (rather than 'wch') for 'which' 'p' for 'post' (in consultation times, e.g. '15 p 8'). This can be dealt with by the entity &past;, which generates '<choice><orig>p</orig><reg>post</reg><reg type="gloss">past</reg></choice>'.

c) The assorted ways in which 'ante meridiem' and 'post meridiem' are expressed in the records should be normalised as 'am' and 'pm' (lower case, no full stops). Most of these can be dealt with by the entities &Anflmfl;, &anflmfl;, &pmfl;, &Anmfl;, &anmfl;, &amfl;, or &antm; for Napier's habitual 'ant m'. In these cases, 'fl' stands in for a flourish, and it is assumed that there is a space before the concluding 'm'. Thus &Anflmfl; generates '<choice><orig>An&flourish; m&flourish;</orig><reg>am</reg></choice>'. Other variants, and examples where the am/pm formula includes other tagging (e.g. <lb>, <del>) or punctuation (e.g. 'ant. m'), should be encoded in full.

Where, however, 'ante meridem' and 'post meridiem' are abbreviated to 'am' and 'pm' in the original, no regularisation is needed, even if the original rendition includes spaces and/or full stops that would not normally appear in a modernised version, e.g. 'a. m.' or 'p m'.

Forman has a tendency to omit the 'meridiem' part of his ante meridiems: this is too common a habit to be regarded as a mistake so should not be tagged <sic>, but should be regularised to 'am' as normal, e.g. '<choice><orig>An&flourish;</orig><reg>am</reg></choice>.

Forman uses 'in m', 'in mer' etc. (meaning 'in meridie', i.e. 'at midday') to mean 'between 11.00 and 13.00' (and occasionally applies it to times even less close to noon than that). This should be given both a normalisation and a gloss: '<choice><orig>in m&flourish;</orig><reg>in meridie</reg><reg type="gloss">around noon</reg></choice>'. The commonest forms of this can be rendered by the entities &inm;, &inflmfl; and &inmfl; on the same principle as am and pm above; any other variants should be hand-crafted.

Napier and Gerence James, however, with a very few exceptions, use 'in m' or 'in mer' more conventionally to mean 'at noon' precisely: in these cases the content of the gloss should be 'at noon' rather than 'around noon'. However, on the occasions when they do obviously use it Forman-fashion, it should of course be coded as above.

d) Abbreviations that are still standard (aside from the question of superscripting) can be left as they stand, e.g. 'Mr', 'Mrs', 'Dr', which can be coded as 'M<hi rend="superscript">r</hi>', 'M<hi rend="superscript">rs</hi>' and 'D<hi rend="superscript">r</hi>' respectively. 'Mr' and 'Mrs' would have been understood at the time as 'Master' and 'Mistress' rather than 'Mister' and 'Missus' but expanding them as such would probably only sow confusion. However, if they really do mean 'master' or 'mistress' in modern terms – e.g. 'the Mrs for the maid' – they should be given normalisations. '&c' (for etcetera) can be left as it stands, coded '&amp;c' (and does not need to be flagged as <foreign> if it occurs in a passage in English).

Napier generally abbreviates 'mistress' as 'Mres', embellished with whatever combination of superscripts and overlinings has taken his fancy at the time. If functioning as role names, these should all be regularised as 'Mrs', e.g. '<choice><orig>M<hi rend="superscript">res</hi></orig><reg>Mrs</reg></choice>', '<choice><orig>M<hi rend="overline">res</hi></orig><reg>Mrs</reg></choice>'.

e) Astrological and alchemical symbols should be transcribed using the entities provided in the in-house entity set (&SagittariusSymbol; (), &TaurusSymbol; (), &VenusSymbol; (), &SunSymbol; () etc.) and glossed using <orig>/<reg>. This enables us to differentiate between the use of planetary symbols to mean gods/days/metals/planets, e.g. '<choice><orig>&VenusSymbol;</orig><reg type="gloss">Venus</reg></choice>', '<choice><orig>&VenusSymbol;</orig><reg type="gloss">Friday</reg></choice>'. By far the commonest occurence of these in the question sections is the use of planet symbols for days of the week. These can be dealt with by the entities &Monday;, &Tuesday; etc., generating e.g. '<choice><orig>&MoonSymbol;</orig><reg type="gloss">Monday</reg></choice>'.

Napier goes through phases of writing days in Latin even in otherwise entirely English passages, e.g. 'die ' for 'Friday' or 'die ' for 'Monday'. These should be given both a Latin regularisation and an English gloss: '<choice><orig>die &VenusSymbol;</orig><reg>die Veneris</reg><reg type="gloss">Friday</reg></choice>'. Provided he has spelled the word 'die' out in full (as he usually does), this choice string can be represented by the entity &FridayN; (or &SaturdayN;, &SundayN; etc.). However, if the 'die' part of the day name is given as 'd', 'd.' or anything else, the full choice string requires handcrafting.

Conversely, Napier sometimes omits the word 'die' and simply refers to a day by its planet symbol even when he is writing Latin. Here the missing 'die' needs to be supplied in the regularisation (since by no stretch of the imagination can the word 'Venus' on its own be construed as meaning 'Friday' in Latin). In these cases, which will only occur in documents or parts of documents flagged as being in Latin, the symbols should be expanded as e.g. '<choice><orig>&VenusSymbol;</orig><reg>die Veneris</reg><reg type="gloss">Friday</reg></choice>'. These choice strings can be represented by the entities &FridayLatin;, &SaturdayLatin; (etc.).

f) The coding, representation and explication of the seemingly idiosyncratic cipher Forman sometimes uses, chiefly to disguise personal names in cases in which he has some personal involvement, remains to be established.

g) Fractions are almost invariably spelled out in full in the casebooks, e.g. 'halfe', '3 quarters'. In the extremely rare event of a vulgar fraction being used in the original text (at the time of writing this has arisen precisely once in over 40,000 files), it should be encoded on this model: '<formula><math xmlns=""><mfrac><mn>1</mn><mn>2</mn></mfrac></math></formula>' (1 over 2) (CASE34854).

5) In other respects, the spelling of the original should generally be preserved, without any orthographic gloss or normalisation, e.g. 'melancoly', 'bene' (for 'been'), 'tymorous'. However, glosses may be introduced, using <orig>/<reg> tags, at the transcriber's/editor's discretion, if the original spelling seems likely to cause serious problems for non-specialist users. For instance, if 'wickes' were judged to be too obscure a spelling of 'weeks' for popular consumption, it could be tagged as '<choice><orig>wickes</orig><reg type="gloss">weeks</reg></choice>'. Note, however, that a number of such potentially confusing spellings (including 'wickes') are mentioned on the Glossary page of the project's website and do not call for regularisation or glossing in the transcriptions.

6) The distinction between 'normalisation' and 'correction' is a very fine one, but if the transcriber/editor deems some part of the text to be an error on the author/scribe's part, the original text and the editorial amendment should be encoded, respectively, in <sic>/<corr> tags within <choice>. For instance, '<choice><sic>squncy</sic><corr>squincy</corr></choice>' (CASE10117). Note that, as in this example, the content of <corr> should be what (the transcriber thinks) the original author would have corrected his mistake to if he had noticed it, even if that still looks 'wrong' to a modern reader (in this case, a modernised version would read 'squinsy', meaning suppurative tonsillitis). The <corr> element has @cert and @resp values with which the encoder can (and should, if there is any doubt) record who proposed the correction and how sure he/she is about it, on a scale of high (pretty confident)/medium (in two or three minds)/low (educated guess).

As in the above example, the <sic> and <corr> elements should contain whole words (or whole numerals), even if only one character requires correction. This makes for a cleaner online display and sidesteps the problem of what to do, for instance, about Napier's habit of writing 'mayed' when he means 'maryed', where what needs to be corrected is the omission of a character.

In the event of the author/scribe inadvertently duplicating text, or including completely irrelevant text, the <corr> part of the <choice> element takes the @type value noText and has no content, e.g. 'Isabel ... Carter of <choice><sic>of</sic><corr type="noText"/></choice> 36 yeres' (CASE1115). Where it appears that the author/scribe intended to delete text but failed to do so, an empty <corr> element with the @type value delText can be used similarly. The borderline between noText and delText is often open to dispute and may call for discussion on a case-by-case basis.

<sic>/<corr> combinations should also be used where the day, date or (less often) time given in the text is manifestly wrong, e.g. if an entry for Wednesday 6 Sept is followed by one for Thursday 6 Sept or an entry for 3 Jan 1597 is followed by one for 3 Jan 1596. It is usually clear from the context which part of the entry is erroneous but not always (e.g. in the first example above, is the 'correct' version Wednesday 6 or Thursday 7?). The transcriber/editor should use her/his judgment in such cases, noting any doubts in comment tags. The metadata should give the corrected day/date/time, with appropriate @cert values in debatable cases. See the Header section of the Guidelines, subsection II. 5) a), for instructions about what to do when both options seem plausible.

Where the content of <corr> is a symbol or abbreviation that would otherwise be supplied with an expansion using <orig>/<reg>, it should not be expanded within <corr>. Instead, the entire <choice> string should nest within the <orig> element of a further <choice> string, with the regularisation of the corrected version appearing as the content of <reg>. By far the commonest example of this in Napier's casebooks is his peculiar habit of writing 'h' where he manifestly intends a Saturn symbol meaning 'Saturday', which is rendered as '<choice><orig><choice><sic>h</sic><corr>&SaturnSymbol;</corr></choice></orig><reg type="gloss">Saturday</reg></choice>'.

Where the converse is the case, e.g. if Napier (as he sometimes does) uses a Saturn symbol where he manifestly intends an 'h', no regularisation of the erroneous symbol or abbreviation is needed: '<choice><sic>&SaturnSymbol;</sic><corr>h</corr></choice>'. If both the content of <sic> and that of <corr> are symbols or abbreviations, e.g. he writes a Mercury symbol meaning Wednesday when he should have written a Mars symbol meaning Tuesday, the procedure is essentially the same as in the h-for-Saturn-symbol case, with no regularisation of the erroneous form required: '<choice><orig><choice><sic>&MercurySymbol;</sic><corr>&MarsSymbol;</corr></choice></orig><reg type="gloss">Tuesday</reg></choice>'.

7) Punctuation (or the lack of it) should be recorded as it stands, without any sort of regularisation. The virgule, or slightly-wobbly-forward-slash character, which is clearly a punctuation mark of some sort but cannot be confidently equated with any modern punctuation mark, should be recorded as the entity &slash;. As with any other punctuation mark, it should follow the preceding text immediately, with no space in between (except in cases where a space clearly is intended, which it quite often is).

8) For the purposes of project documentation, we will normalise or regularise rather than normalizing or regularizing.

Format Tagging

1) The consultation records and their component parts do not normally feature headings, but any that occur should be tagged <head>, with a @rend value of center if they are centred. <head> is only permitted as the first thing in a <div> or <lg>.

2) Paragraphs in prose should be tagged <p>. There is normally no need to record variations in the formatting of paragraphs (e.g. indentation of first line, right-alignment, centring), though a @rend value is available, as explained under <p> in the Element Set, in case the need for it should arise.

3) The beginning of any new line of prose should be marked <lb>. This includes the first line of a paragraph or heading: although <lb> is formally defined as 'line break' it is more helpful to think of it as 'line beginning'. Where a line break occurs between words there should be a space after the first word but not before the second. Where it occurs mid-word or at the beginning of a paragraph or heading there should be no space either side of it.

Line breaks that appear to be there for a reason but do not represent the beginning of a new paragraph (e.g. to split up a heading) should be tagged <lb type="intentional"/>. These are the only ones that will normally be rendered in the display, which is why the spacing around <lb> is so important.

If a word is hyphenated at a line break, <lb> takes the @type value hyphenated and the hyphen itself should not be otherwise indicated unless it is 'hard' and would have appeared anyway, e.g. 'star-<lb/>chart', in which case the <lb> does not require any attributes.

4) Any text string in verse (even if it is only one line long or less) should be tagged <lg> (line group) and each individual line (or incomplete line) within it tagged <l>.

5) Whitespace, i.e. space left blank between lines or within a line of text, should be encoded as <space>. The @dim (dimension) value (horizontal or vertical) states exactly what it says it does; the @extent value is a numerical character indicating the extent of the space, and the @unit value states what sort of units it is being measured in (normally chars, i.e. characters, for horizontal whitespace or lines for vertical whitespace). It is clearly impossible to be mathematically precise about the @extent value: a reasonable approximation of how many characters or lines would have fitted into the space is sufficient.

6) Text that has been distinctively rendered in some way, e.g. underlined, italicised, superscripted etc., should be tagged <hi> with a @rend value of e.g. underline, doubleUnderline, overline, italic, bold, superscript, subscript, large, small. We will not, however, attempt to record all the changes in size of Forman's erratic handwriting, or the fact that he normally puts part at least of the subject's name in much larger letters than the rest of the text. If such fluctuations do seem to be relevant to the interpretation of the text (e.g. deducing who is the subject in ambiguous cases), this can be mentioned in a comment tag.

<hi> can also be used to indicate a change of writing medium, e.g. a @rend value of redInk to indicate that a word or phrase is written in red ink, or redUnderline to indicate that a text string not written in red has been underlined in red. (If underlining occurs within <hi rend="redInk">, the underlining is presumed also to be in red if there is no indication to the contrary.)

7) The vast majority of both Forman's and Napier' casebooks are laid out in a format of two columns per page, referred to for Casebooks purposes as column A (quadrants 1 and 2) and column B (quadrants 3 and 4). If the question section of any consultation, or any other material we transcribe, crosses column or page breaks, these should be marked by <cb> or <pb> respectively, with @xml:id and @n values indicating the designation of the new column/page. The @xml:id value of <cb> should be colA or colB (it is not at all unusual, especially in Napier's records, for an entry to begin in the right-hand column and continue in the left). The @n value is simply a or b. The @xml:id value of <pb> is the page or folio number preceded immediately by 'p' if the document is numbered by page or 'f' if it is numbered by folio, e.g. p76, f147r. The @n value is the same number without the preceding 'p' or 'f': 76, 147r. Since the header will contain a detailed description of the layout of the material, there is no need for the @xml:id and @n values to be more precise or expressive than this, even in the case of complex counter-intuitive shifts.

If the <cb> or <pb> tag occurs in mid-paragraph, one space should be left on either side of it unless it occurs in mid-word, in which case there should be no space on either side. If a word split by a page or column break is hyphenated, both tags can take the @type value hyphenated, as with <lb>. In the vast majority of cases, however, these tags occur between paragraphs or at the end of the transcribed section of the entry, and the spacing around them is irrelevant.

<cb> and <pb> should nest in <div> if they occur part way through the transcribed text, but be placed outside <div> if they do not. If there is a chart (which is considered a <div> in its own right) and a column or page break between two transcribed sections of an entry, each section should be encoded as a separate <div>, with the <cb> or <pb> element appearing as either the last thing in the first <div> or the first thing in the second <div>, depending on which column or page the chart appears on.

Where a page, or part of a page, has been divided horizontally rather than vertically, so that text proceeds in linear fashion from column A to column B at several points within the file, no <cb> tags are needed: the fact that the entry occupies more than one quadrant is adequately expressed by the content of <bibl type="positionOnPage">.

If, as is normally the case in the medical records proper rather than the more discursive guides to astrology or other texts, the entry occupies only (part of) one column or one uncolumnated page, no <cb> or <pb> tags are needed: <bibl type="positionOnPage"> gives users as much information as they need.

If one of the two columns is itself subdivided into smaller columns, this can be mentioned in a comment tag but the 'mini' column break does not need to be recorded in the coding.

In the extremely rare event of text moving on to the same page or column more than once within a given file, e.g. an entry begins in column A, continues in column B, reverts to column A and then moves back into column B a second time, the two @xml:id values should be distinguished by appending a hyphen and a numeral to the normal value, e.g. colA-1, colA-2, or f46r-1, f46r-2, but the two @n values will be identical. The same applies, with regard to <cb>, in the even rarer event of an entry covering more than one page and having a column break on each.

8) Added text is tagged <add>, with a @place value of supralinear (above the line), infralinear (below the line), inline (neither higher nor lower than the surrounding text but obviously added later), interlinear (added text that itself runs to more than one line), over (physically overwriting an earlier text string), marginRight (in the right margin), marginLeft (in the left margin), pageTop (at the top of the page) or pageBottom (at the bottom of the page).

<add> may nest in <add>, in which case the @place value of the nested <add> refers to where it appears relative to the inserted text it nests in. For instance, an infralinear insertion into a supralinear insertion may still be above the line of the main text into which the supralinear insertion is inserted, but nonetheless takes the @place value infralinear.

9) Deleted text is tagged <del>, normally with one of the following @type values: blockStrikethrough for whole sections struck through en bloc, strikethrough for a text string crossed out by a continuous horizontal line, cancelled for any heavier deletion, erased for text that has been rubbed or scraped away from the original document, or over for cases where one text string overwrites another, functioning simultaneously as deletion and replacement. Text tagged <del type="over"> will always, by definition, be followed immediately by text tagged <add place="over">.

However, where the deletion appears to function as a deliberate suppression of information rather than simply as (part of) a revision or correction, the deleted text should be tagged <del type="redactedCancelled"> or <del type="redactedBlockStrikethrough">. (The latter is, for obvious reasons, very rare, but occasionally there is a clear intent to suppress the whole entry, which has been achieved by heavily deleting the salient points and then striking through the entire passage en bloc.)

Where added text replaces deleted text, the two strings should nest in a <subst> element and the deleted text should be transcribed first. This applies even in cases where the caret mark or other insertion indicator appears, physically, before the <del>. If the added text has the @place value over and/or it replaces a text string that is only part of a word, number or other textual unit, it should follow the deleted text with no space between the two elements. Otherwise, one space should be left between the <del> and <add> elements.

Except in the case of overwriting, it is not always obvious whether an addition does replace a deletion, as opposed to just happening to occur at the same point. <subst> should only be used if the transcriber/editor is reasonably confident that it really does represent a substitution.

If <add> and <del> are co-extensive - i.e. the added text has been deleted in its entirety but the surrounding text is undeleted - <del> should nest directly within <add> rather than vice versa. However, <add> may nest within <del> if the insertion represents part of a longer text string that was subsequently deleted in its entirety.

If the content of <del> is a symbol or abbreviation that would otherwise be supplied with an expansion using <orig>/<reg>, or a mistake that would otherwise be supplied with a correction using <sic>/<corr>, the deleted should not be expanded or corrected: e.g. '<del type="strikethrough">M<hi rend="overline">res</hi></del>'.

<del> may also nest within <del> if it appears that some part of a text string had been deleted before the longer text string was.

10) The hand or hands in which the document is written are recorded in the <handNotes> section of the <teiHeader>. If more than one hand features, there are two ways of distinguishing them in the body text:

a) If the main text has been written in one hand and subsequently altered by another hand, the identity of the second hand can be noted by the @hand attribute applied to the tags that record its interventions. For instance, supposing A (the principal scribe) wrote 'the cat sat on the mat' and B changed it to 'the dog sat on the mat', and they are given the @scribe values scribeA and scribeB respectively in the <handNotes> section, this would be tagged 'the <subst><del type="strikethrough" hand="#scribeB">cat</del> <add place="supralinear" hand="#scribeB">dog</add></subst> sat on the mat' (i.e. the deletion - though not the deleted word itself - and the addition are both in hand B).

b) If the main text simply changes from one hand to another at some point, this can be marked with the empty element <handShift>, placed immediately before the first character in the new hand, with the @new value linking to the code for the new hand, e.g. 'the cat <handShift new="#scribeB"/>sat on the mat' indicates that A wrote the text up to 'the cat' and then B took over.

11) Marginal and other notes are tagged <note> and transcribed at the point in the text to which they refer. If a note indicator (such as an obelus, asterisk or superscript character) is present, this should be recorded as the @n value of <note>, using an entity if necessary. If there is any doubt about which point in the text a note refers to - and/or whether a given portion of text counts as a note or not - this should be mentioned in a comment tag. The physical location of the note can be recorded in the @place value of <note> as infralinear, inline, interlinear, lineBeginning, lineEnd, marginLeft, marginRight, pageTop, pageBottom (cf. <add>), or chart (for notes added inside an astrological chart, as often happens in Napier's manuscripts). marginLeft and marginRight should be seen as relative to the point they relate to: for instance, notes that actually occur in the middle of the page as a whole should be considered marginRight if they pertain to a point in the left column or marginLeft if they pertain to a point in the right column. These values are currently constrained but can be expanded if it proves necessary.

Unclear, Illegible or Omitted Text

1) Uncertain or conjectural readings should be tagged <unclear>, with a @cert value indicating the degree of certainty about the reading on a scale of high (pretty confident), medium (doubtful) or low (an educated guess). The reason why the text is unclear is expressed by the @reason value, which may be any of the following:

binding: text has been rendered unclear by over-zealous binding bleedthrough: text has been rendered unclear by ink bleeding through from the other side of the folio blot blotDel: something that could be either an accidental blot or a deliberate deletion copy: poor quality of the copy being used. By the end of the project these should, ideally, all have been checked against the original, but this is a useful way of keeping track of what needs special attention damage: the MS is damaged in some way. Where there is significant damage to a given manuscript or page, the exact nature of the damage can be described in the <notesStmt> section of the <teiHeader> del: deleted faded foxed over: text is hard to read because it overwrites other text: NB not because it has itself been overwritten, which counts as del hand: lousy handwriting. This is the default option if none of the others applies.

<unclear> may contain any quantity of text, from a single letter within a word to a number of whole words (unless it violates element boundaries, which is extremely rare: if it does, it will have to be presented as two or more consecutive <unclear>s).

2) Any text string that is missing entirely from the surviving manuscript (normally through damage), or is wholly illegible for whatever reason, and cannot be even conjecturally supplied, should be tagged <gap>, with @reason values as for <unclear>, a numerical @extent value, and a @unit value of chars, words or lines (always plural, even if the @extent value is 1). The @extent value does not need to be too precise: it is obviously impossible to tell exactly how much text has disappeared under a large blot, but useful to give a general idea of the scale of the omission. Alternatively, if a reasonably accurate guess is impossible (e.g. the bottom half of a page has been torn off so the loss could be anywhere between zero and three hundred words), @extent takes the value unclear and no @unit value is needed.

If it is unclear whether text is missing or not (typically in cases of MS damage or binding), <gap> can also take an optional @cert value of high, medium or low.

3) Material that is missing or illegible but can be supplied, if only conjecturally, should be tagged <supplied>, with @reason and @cert values as for <unclear>, except that if there is no reasonable doubt as to the content, no @cert value is needed. If the only apparent reason for an omission is authorial or scribal absent-mindedness, it can be rectified using <sic>/<corr>: '<choice><sic>40 9</sic><corr>40 &past; 9</corr></choice>' (CASE2847).

Content Tagging

The transcription itself is housed in the <text> element of each file, which follows <facsimile>. <text> in turn contains <body>, which for our purposes nests directly inside it. (<text> can in principle contain other things too but these are not normally relevant to Casebooks transcriptions.) <body> contains one or more <div>s (divisions, i.e. self-contained sub-sections).

Unless the original document contains no text at all, and never did, or the language cannot be ascertained (because the original text has been wholly lost or what remains of it consists only of proper nouns, symbols or formulae such as '15 p 4' which could equally well be regarded as being in English or in Latin), <text> takes an @xml:lang value of en, la, el, fr or he (English, Latin, Greek, French or Hebrew) stating the principal language of the document, irrespective of whether or not other languages feature in it too. If the original language cannot be ascertained, the @xml:lang value should be und (undetermined). If there was never any text there in the first place (this is extremely rare), no @xml:lang value is required. In other cases, the principal language will usually if not always be English or Latin.

In some cases, the text is split more or less 50/50 between English and Latin. Here it does not really matter which is regarded as the principal language provided any text strings in the other are marked up as <foreign> (see section 4 below).

Overwhelmingly the greatest part of the contextual information (names, dates, places, roles, nature of queries) is dealt with in the header. However, there are certain features of the body text that require specific encoding.

1) Addresses. Where an address is specified, however vaguely, for any of the people listed in <listPerson> in the header, the relevant text string should be marked up in the body text as <rs type="address" xml:id="address1"> (or <rs type="address" xml:id="address2">, <rs type="address" xml:id="address3"> etc. if more than one address is specified): this is pointed to by the @sameAs value of <residence> in the header.

Sometimes an address is split into two or more component parts with intervening text entirely extraneous to the address, e.g. 'my Cosen Stocker intreated me to goe to Mr Cutlers house to see him being extreme sicke & I came thither in london march 29' (MS Ashmole 334, f. 6r), where 'Mr Cutlers house ... in london' constitutes an address in its own right but the intervening text forms no part of it. This can be dealt with by using the @xml:id, @next and @prev (previous) attributes of <rs>, on this model:

'my Cosen Stocker intreated me to goe to <rs type="address" xml:id="address1" next="#address2">Mr Cutlers house</rs> to see him being extreme sicke &amp; I came thither <rs type="address" xml:id="address2" prev="#address1">in london</rs> march 29'

In such cases, the <residence> element in the header points to the first of the available @xml:ids (in this case address1).

Where a previous address is mentioned in the entry, <residence> takes a @notAfter ISO value of the last plausible date at which it might have been a current residence (the day before the consultation if no other evidence is available) or a @to value of the date on which it ceased to be one if this can be ascertained.

2) Other place names. Where other geographical locations are specified, however vaguely and in whatever context, they should be marked up as <placeName>. This can be anything from a room to a continent, and includes past or prospective addresses. In the comparatively rare instances where such place names need to be linked to from the header (for instance because they record places of birth, marriage or death), <placeName> takes an @xml:id value of place1 (or place2, place3 etc. if more than one occurs in a given file). If they are simply places mentioned in passing, e.g. 'An Knight of half a yere suckled in <placeName>Gothurst</placeName>' (CASE10196), no attribute is required.

3) Ship names. These crop up quite a lot in Forman's entries and occupy a grey area between 'address' and 'geographical location'. They should be marked up as <name type="ship">.

4) Languages. Any part of the document not in the main language should be tagged <foreign> (this can include English if the main language is not English), with an @xml:lang value of en, la, el, fr or he (English, Latin, Greek, French or Hebrew). In Forman's case at least, 'Latin' can be construed as including 'dog Latin'. <foreign> may nest in <foreign> if, for instance, there are a couple of letters in Greek within a Latin interpolation into an entry otherwise in English.

5) Charts. Where astrological and/or geomantic charts are present, these should be noted as &astroChart; and/or &geoChart;, each of which constitutes a <div> in its own right. Deleted charts should be recorded as &astroChartDeleted; or &geoChartDeleted;. If the charts are incomplete, or, in the case of geomantic charts, represent the preliminary working for the finished chart (whether or not a finished chart is also present), they should be recorded as &astroChartPartial;, &astroChartPartialDeleted;, &geoChartPreliminary; or &geoChartPartial;. Where the outline of a chart has been drawn up but no data at all entered, it should be recorded as &astroChartBlank; or &geoChartBlank;. Where a chart-size space has been left on the page without so much as a preliminary grid or outline, it should be recorded as &astroChartSpace;.

In the fairly rare case of charts occuring within (rather than after) the question section of an entry, the transcription needs to be divided into two or more <div>s with the chart(s) appearing in between them.

Document last modified: 8 November 2013

Cite this as: Casebooks Project (Transcription Guidelines),, accessed 2017-02-25.