Difference between revisions of "Team Giovanni/Patrizia"

From MarineLives
Jump to: navigation, search
Line 56: Line 56:
 
- Ocasionally there are clear collating errors, since some later pages (in order within the volume, whether or not actually foliated) bear deposition or other type of legal statement dates before earlier pages in the same volume
 
- Ocasionally there are clear collating errors, since some later pages (in order within the volume, whether or not actually foliated) bear deposition or other type of legal statement dates before earlier pages in the same volume
  
For further information on volumes and bindings see: Leather bindings
+
For further information on volumes and bindings see: [[Leather bindings|Leather bindings]]
  
For further information on foliation and example see: Double page spreads
+
For further information on foliation and example see: [[Double page spreads|Double page spreads]]
  
 
For further information of the range of document types (which we will NOT see in HCA 13/71, but exist in other HCA records see
 
For further information of the range of document types (which we will NOT see in HCA 13/71, but exist in other HCA records see
Line 66: Line 66:
 
(2) Data types within our chosen HCA 13/71 volume
 
(2) Data types within our chosen HCA 13/71 volume
  
For further information on data types within HCA 13/71, see Page layout, and specifically Typical single leaf layout
+
For further information on data types within HCA 13/71, see [[Page layout|Page layout]], and specifically [[Typical single leaf layout|Typical single leaf layout]]
  
 
- The page illustrated in the Typical single leaf layout is the frst page in volume, HCA 13/68 f. 1r, and thus comes from the same record series as our chosen volume, HCA 13/71
 
- The page illustrated in the Typical single leaf layout is the frst page in volume, HCA 13/68 f. 1r, and thus comes from the same record series as our chosen volume, HCA 13/71
Line 107: Line 107:
 
- The typical sections (though not always present) can be seen by looking sequentially at the six images below, which are from successive folio sides in our chosen volume. I have added my comments on structure by each image link below
 
- The typical sections (though not always present) can be seen by looking sequentially at the six images below, which are from successive folio sides in our chosen volume. I have added my comments on structure by each image link below
  
HCA 13/71 f.99v P1130401
+
[http://marinelives-transcript.org/scripto/scripto/?scripto_action=transcribe&scripto_doc_id=363&scripto_doc_page_id=407HCA 13/71 f.99v P1130401]
 +
 
 
* This page starts just after the introductory material, which is on the previous page (HCA 13/71 f.99r)
 
* This page starts just after the introductory material, which is on the previous page (HCA 13/71 f.99r)
 +
 
* The page starts part of the way through the answer of the deponent to the second article (arle) of the allegation (allon)
 
* The page starts part of the way through the answer of the deponent to the second article (arle) of the allegation (allon)
 
- confusingly the first full paragraph starts "To the third hee saith...", but it is clear from the next paragraph that the transcriber is within the response to the article portion of the witness deposition ("To the 4th arle hee saith...")
 
- confusingly the first full paragraph starts "To the third hee saith...", but it is clear from the next paragraph that the transcriber is within the response to the article portion of the witness deposition ("To the 4th arle hee saith...")
 +
 
* From inspection of the previous page XXXX I have added the following metadata for this page. The case summary is a verbatim extract from the relavant part of the front matter on the previous page. Likewise the deposition metadata. Note that "4." refers to the deposition of Charles Anquestil being the fourth deposition, of which there is at least one more, which comes later in the volume
 
* From inspection of the previous page XXXX I have added the following metadata for this page. The case summary is a verbatim extract from the relavant part of the front matter on the previous page. Likewise the deposition metadata. Note that "4." refers to the deposition of Charles Anquestil being the fourth deposition, of which there is at least one more, which comes later in the volume
  
Line 123: Line 126:
 
</header>
 
</header>
  
HCA 13/71 f.100r P1130402
+
[http://marinelives-transcript.org/scripto/scripto/?scripto_action=transcribe&scripto_doc_id=363&scripto_doc_page_id=361 HCA 13/71 f.100r P1130402]
 
+
  
 
* This page continues the answers of the deponent to the allegation
 
* This page continues the answers of the deponent to the allegation
Line 140: Line 142:
 
- The first and second cross-interrogatories are addressed on this page
 
- The first and second cross-interrogatories are addressed on this page
  
HCA 13/71 f.100v P1130403
+
[http://marinelives-transcript.org/scripto/scripto/?scripto_action=transcribe&scripto_doc_id=363&scripto_doc_page_id=362HCA 13/71 f.100v P1130403]
  
 
* The top third of this page contains the remainder of the deposition which I labelled in the metadata as
 
* The top third of this page contains the remainder of the deposition which I labelled in the metadata as
Line 185: Line 187:
 
* Note that I have not marked up the bottom of this page
 
* Note that I have not marked up the bottom of this page
  
HCA f.101r P1130404
+
[http://marinelives-transcript.org/scripto/scripto/?scripto_action=transcribe&scripto_doc_id=363&scripto_doc_page_id=364 HCA f.101r P1130404]
  
 
* Nothing remarkable
 
* Nothing remarkable
Line 191: Line 193:
 
* No front matter, since on the previous page, and deposition goes over to the following page
 
* No front matter, since on the previous page, and deposition goes over to the following page
  
HCA 13/71 f.101v P1130405
+
[http://marinelives-transcript.org/scripto/scripto/?scripto_action=transcribe&scripto_doc_id=363&scripto_doc_page_id=366 HCA 13/71 f.101v P1130405]
  
 
* I have not finished transcribing or marking up this manuscript page
 
* I have not finished transcribing or marking up this manuscript page

Revision as of 10:01, August 31, 2012

Team Giovanni/Patrizia

Team Colin

Editorial history

23/08/12: CSG, created page






Suggested links


Team Colin
Team Jill
Team William

TEI: Text Encoding Initiative
TEI Lite



Tasks for the week



Week commencing 20th August 2012




Week commencing 30th August 2012


- Patrizia, how should we deal with quantities and currencies database wise? <quantity value="hour">6. howers</quantity> OR <quantity value="hour">6</quantity>. howers ? I suggested the latter.
Giovanni:I agree

- We'll need to decide which elements we want in the header, to mimic some of a TEI one. If you manage to give a thought about this, it would be great.
I'm afraid I'm not familiar with the structure of the papers until now. Which are the elements that we want to include?

- One further concern: units of meaning in the text (depositions, cases, etc.). We need to identify them properly I think. Followup: we should probably have one header per document (ie picture), and a separate header for cases, depositions (each case contains many, scattered across different pages). Ideally we'll have some metadata to associate the document with a picture, but we'll also represent the units of meaning of the source, which is paramount.
Patrizia: isn't this something that should mimic the structure of the national archives? (I mean the documentary unit)



Week commencing 3rd September 2012




Useful email records



Colin to Giovanni, Patrizia, Charlene; 31/08/12; 10:56


In answer to Giovanni's points:

(1) Foliation

The foliation I am citing in the names of images, e.g. HCA 13/71 f.101v is modern archival block printed foliation. Each folio number is printed in the top RH corner of the verso page. There is no original foliation in this volume

- Foliation is variable in HCA volumes and records. Bound volumes account for 95% of all HCA records at TNA. However, many (possibly most) such bound volumes have neither modern (archival) nor older handwritten foliation.

- Where there is original foliation, I presume it was added at some stage, probably contemporaneous with the creation of the bound volume.

- Note that the bound volumes are not pre-bound prior to being filled out by clerks. They consist of folded leaves which have been stiched together to create one to two year groupings (occasionally for a longer period) of court records.

- Ocasionally there are clear collating errors, since some later pages (in order within the volume, whether or not actually foliated) bear deposition or other type of legal statement dates before earlier pages in the same volume

For further information on volumes and bindings see: Leather bindings

For further information on foliation and example see: Double page spreads

For further information of the range of document types (which we will NOT see in HCA 13/71, but exist in other HCA records see

- Interrogatories: Numbered paragraphs Interrogatories are in the HCA records series HCA 23/XX (Image shows a page view of Interrogatories for HCA 23/19, which are numbered questions. In HCA/71 part of the legal statements we are transcribing are answers to the Allegations ("allon") which have been made (this is also a formal court document, but not included in HCA 13/71); and part of the staements (but not always) are answers to the Interrogatories (with reference being made in the answer to the number of the interrogatory. These numbered answers do not cite the text of the interrogatory, but often paraphrase the language of the interrogatory in giving the answer. I am hoping that as part of the linking exercise either in this project, or later in 2013, it will be possible to go to the appropriate volume of interrogatories, image these pages, and then link the pages to the answers we will have transcribed for HCA 13/71.

(2) Data types within our chosen HCA 13/71 volume

For further information on data types within HCA 13/71, see Page layout, and specifically Typical single leaf layout

- The page illustrated in the Typical single leaf layout is the frst page in volume, HCA 13/68 f. 1r, and thus comes from the same record series as our chosen volume, HCA 13/71
- The page shows most of the characteristics we need to understand (and to your point, Giovanni), possibly identify as sub areas within the record

      • Case summary details top left


      • Date of the court session (here it was the 22nd of September 1659, but written as


The 22:th day of September 1659

Note Modern block printed folio number in top right hand corner (here it is folio one, and should be described when transcribing as f. 1r (folio one recto; recto = right, or front)

      • Brief statement as to nature of the legal record (here it is an examination)


Examined upon an Allon on the behalfe of
the sayd Keepers of the Liberty of the Liberty of England by
Authority of Parliament

      • Witness name, place of abode, and estimated age at top of main text


Mark Harrison of Wapping in
the County of Midds Mariner aged
seven and twenty yeares or thereabouts
a witnes sworne and examined deposeth and
saith as followeth. vizt

      • Abbreviation in left hand margin "Ren:dt" (contraction for latin word, XXX = XXX)


      • Number in left hand margin stating which number witness in the specific legal case (here this is the first witness)


      • Main body of text (here consisting of thirty seven lines, divided into four paragraphs)


      • Paragraphs in main body of text introduced with the phrase "To the first(second/third/fourth) arle of the sayd allon this deponent saith and deposeth that..."


      • First word of next page at bottom right hand side of page, below end of main text


Missing from this page is a sense that the "main body of text" (in terms of the vast proportion of the words of an individual record relating to one witness statement) can have several sections

- The typical sections (though not always present) can be seen by looking sequentially at the six images below, which are from successive folio sides in our chosen volume. I have added my comments on structure by each image link below

13/71 f.99v P1130401

  • This page starts just after the introductory material, which is on the previous page (HCA 13/71 f.99r)


  • The page starts part of the way through the answer of the deponent to the second article (arle) of the allegation (allon)

- confusingly the first full paragraph starts "To the third hee saith...", but it is clear from the next paragraph that the transcriber is within the response to the article portion of the witness deposition ("To the 4th arle hee saith...")

  • From inspection of the previous page XXXX I have added the following metadata for this page. The case summary is a verbatim extract from the relavant part of the front matter on the previous page. Likewise the deposition metadata. Note that "4." refers to the deposition of Charles Anquestil being the fourth deposition, of which there is at least one more, which comes later in the volume


<header>
<folio>HCA 13/71 f.99v </folio>
<picture>P1130401</picture>
<case-summary>The <person>Lord Protector</person> against a certaine shipp called the <ship>fortune</ship>: whereof <person>Daniel <help>Curetson</person></help> is <profession>Master</profession> taken with wynes, and againt the <person>Earle of Charott</person> and <person>others</person> Owners of a shipp of warr called the <ship>golden Eagle</ship> of <place>Callice</place>, and against the said <person>Earle of Charrott</person> and others owners of the <ship>Royal Mary</ship>, a shipp of warr and against all others</case-summary>
<deposition>4. <person>Charles Anquestil</person>, of <place>Callice</place> in <place>ffrance</place> <profession>Mariner</profession> and <profession>Gunner</profession> of the said shipp the <ship>Mary Royall</ship>, aged <quantity value="year">40</quantity>: yeares or thereabouts a Wittnesse sworne and examined saith as followeth</deposition>
<document-date normalized="25/02/1655"></document-date>
<status>First cut transcription completed; Requires editorial input</status>
<first-transcriber>Colin Greenstreet, 28/08/12</first-transcriber>
</header>

HCA 13/71 f.100r P1130402

  • This page continues the answers of the deponent to the allegation


  • Note that the page starts partly way through the deponent's answer to the 6th article of the allegation. This is presumed, but would need to be checked against the previous page, to ensure that it is not infact the answer in one paragraph to multiple articles. Having checke the previous image and trasncription, I see that it is indeed just the answer to the single article 6 (LINE 46 of HCA 13/71 f.99v P1130401: To the 6th hee saith.."


  • Note that some paragraphs address more than one article of the allegation (e.g. "LINE 6: To the 7th and 8th arles...")


  • Note that there are apparently eleven articles in the original allegation, since the deponent answers eleven articles, but with the last numbered article being the tenth ("LINE 34: To the 10th hee saith...). The presumed eleventh is introduced as "LINE 42: To the last hee saith..."


  • Towards the bottom of this page, a new section of the same deposition begins, and is marked by the centred heading:


"LINE 45: To the Crosse Interries:-TEXT IS CENTRED"

- The first and second cross-interrogatories are addressed on this page

13/71 f.100v P1130403

  • The top third of this page contains the remainder of the deposition which I labelled in the metadata as


<deposition>4. <person>Charles Anquestil</person>, of <place>Callice</place> in <place>ffrance</place> <profession>Mariner</profession> and <profession>Gunner</profession> of the said shipp the <ship>Mary Royall</ship>, aged <quantity value="year">40</quantity>: yeares or thereabouts a Wittnesse sworne and examined saith as followeth</deposition>

  • This deposition ends with a signature, which is the original signature of the witness, and is a signature I believe to the verity of the whole clerical record, including the front data. Logically I guess it is a separate data type. Remember some signatures are "marks", where the witness is illiterature, or semi-literate, and a standard piece of analysis would be to look at literacy by occupation, witness type, date etc..


  • The lower two thirds of this page is a new deposition, prefaced by new front material


  • I have recorded the metadata for this part of the page as:


<header>
<folio>HCA 13/71 f.100v </folio>
<picture>P1130403</picture>
<case-summary>The <person>Lord Protector</person> against a certaine shipp called the <ship>fortune</ship></case-summary>
<deposition><person>Charles Anquestil</person>, of <place>Callice</place> in <place>ffrance</place> <profession>Mariner</profession> and <profession>Gunner</profession> of the said shipp the <ship>Mary Royall</ship>, aged <quantity value="year">40</quantity>: yeares or thereabouts a Wittnesse</deposition>
<document-date normalized="25/02/1655"></document-date>
<status>First cut transcription completed; requires checking</status>
<first-transcriber>Colin Greenstreet, 29/08/12</first-transcriber>
</header>

  • Note that I have chosen myself to summarise the case-summary material, which I had recorded verbatim in the previous metadata entry, which is clearly bad practice


  • Note that the front material does not formally repeat the date which was on HCA 13/71 f.99r. Instead it states "same day", so the transcriber needs to track back in the images (or the transcriptions) to see which day is being referred to


  • Note that the deposition in the lower two thirds of the page is for the same case as on the previous page, and is the fifth deposition


- The text of the front material states:

LINE 27. 5.us/ John Mercier of callice in france Mariner Quartermaster
LINE 28. Of the sayd shipp Goulden Eagle aged <quantity value="year">29</quantity>. or thereabouts a [?Wittnes?s?e?]
LINE 29. sworne and examined saith as followeth./

  • This fifth deposition starts by addressing the same allegation as the fourth deposition, and this page has the answer to the first and part of the second allegation


LINE 31: . To the first Arle of the said Allegation hee saith, That hee this deponent 31
LINE 49: To the second arle hee saith, That the said shipps being at Sea in...

  • As a very small point of deatil (but details can cause problems later for finer grained markup, I have had to deal with the use of the indefinite article as a quantity and marked it up accordingly:


LINE 88. friggat about <quantity value="league">a</quantity> league of the English Coast, shee not being then

  • Note that I have not marked up the bottom of this page


HCA f.101r P1130404

  • Nothing remarkable


  • No front matter, since on the previous page, and deposition goes over to the following page


HCA 13/71 f.101v P1130405

  • I have not finished transcribing or marking up this manuscript page


  • It contains cross-interrogatories


LINE 31: 31. To the Crosse Interries/:-Centre heading

  • It is signed by Feban Merchier (I think I am correct in reading this as an "F"), which is possibly a variant of the modern French name "Fabian", and poses a name equivalence point which we will occasionally face on other non-English deponents, since the name was anglicised in the front matter as "John Mercier"


  • Please not that there is a seventeen folio jump between images HCA 13/71 f.101v P1130405 and HCA 13/71 f.118r P1130406, since I had already photpgraphed f.102r to f.117v earlier. Giovanni, I know this is not ideal.


Well, rather a long email, but I hope this helps.

I am posting the email to the Team Giovanni/Patrizia discussion area, to which I am also posting earlier relavant emails:

   Useful email records
       Giovanni to Patrizia, Colin, cc. Charlene; 31/08/12: 09:04
       Giovanni to Patrizia, Charlene, Stuart, Colin, Jill, William; 31/08/12: 08:36
       Patrizia to Giovanni, Colin 30/08/12; 23:25
       Patrizia to Giovanni, Colin, Charlene, Stuart, Jill, William: 30/08/12: 23:22
       Giovanni to Patrizia, Charlene, Colin, Jill, William; 30/08/12: 12:05
       Patrizia to Giovanni, Colin: 29/08/12; XXX




Giovanni to Patrizia, Colin, cc. Charlene; 31/08/12: 09:04


cc. Charlene since I think she needs to comment and be aware of this.

Regarding documental structure, units of meaning in the text and the cataloguing from the NA.

I feel we need to know, in detail, what kind of records they have at the National Archives about our documents, if they have some summary (Patrizia, I mean the regesto, am I correct?), and what's the archivistic unit/s we'll transcribe (also in broader context). This, because we need to be consistent.

Secondly, we need to know what's the structure of the documents, and how it relates (if it does) to pagination (by the way, is there some sort of pagination or foliation? is it original or made by modern archivists? We need to track it in the header if so). Colin, you mentioned to me we have cases and, within each case 1+ depositions. Is this all? Are there other identifiable units of meaning? How these relate with pagination? I see there can be many depositions in one page, and anyway they can start everywhere. Is the same for cases or do they start in a blank page?

We'll probably need a few tags do deal with all this: the header is meant to describe the document being transcribed, as a picture (which is our, arbitrary assumption, but it's a unit of work, not of meaning, we cannot avoid to use). We'll then need (probably) a case tag (with a summary? date? parties?) and a deposition tag (again, summary, date, witness, ..). These will deal with units of meaning, not of (our) work. Needless to say, we need them both: for us these documents are pictures, but the original was/is divided in cases and the like.

I feel this is important, and also my bad not to have asked before. All the best,



Giovanni to Patrizia, Charlene, Stuart, Colin, Jill, William; 31/08/12: 08:36


I'm happy to see Patrizia and I broadly concur on our views: I basically agree with everything. Just a couple of points:

- about the infamous Callice in ffrance: as I said before, I agree we either mark just Callice, or the whole Callice in ffrance as a single place (reason might be that whoever said/wrote it found useful to disambiguate, even if in this case it's straightforward). About marking in a separate way countries, well for me it's no issue: I made only a generic place tag to avoid overwhelm transcribers with buttons, but it would be better for us markuppers not to have places, but cities, countries, etc. If you like, we can have the place tag to perform as the currency or date one, with a value in which the transcriber can specify what kind of place it is: <place value="city">Callice</place> in ffrance. Bear in mind it's hard to find a right balance between richness of input provided and ease of use.

- I'm finally starting to appreciate what Patrizia has in mind for the database: we'll just need to discuss about technology, but the aim I share. I also think we agreed that a) we cannot split transcription and markupping, but we can't have the two done wholly at the same time, so a finer grain mark up plus database building will follow (expecially dealing with non palaeographic but more semantic stuff) b) we might need a few more months to do it properly c) we should have more people helping us on that task too. So, this provided, my initial assumption of processing almost everything automatically due to time and workforce constraints is no longer that imperative. This probably leaves a bit more room for a finer grain tagging by transcribers.



Patrizia to Giovanni, Colin 30/08/12; 23:25


I tried to catch up some issues, probably in an untidy way. I hope something is useful.
Giovanni, tell me if you read my comments in our area. I'm not clear how it works.
I'll try to make an example of excel file for tomorrow, using one of the transcriptions of Colin.



Patrizia to Giovanni, Colin, Charlene, Stuart, Jill, William: 30/08/12: 23:22


PATRIZIA: Only some thoughts about some (not all) of the questions. As Colin said, I am travelling, and see pages only in a very limited and uncomfortable way. Sorry if some comment is unclear.



COLIN: For example:

The button ship: do we highlight "fortune", or "the fortune", or "the shipp the fortune". Does it matter? Giovanni, the "ship" category is currently not displaying in colour the HTML driven publication of the transcribed page at the bottom (it displays as e.g. "the said shipp < style="color:blue">fortune")

PATRIZIA: Yes, it matters. If you include the article in the tag, everything will be sorted under 'the'. Think how the names are painted on true ships: it's likely 'Fortune', not 'The Fortune.' It's the same problem I already highlighted in my last email about 'said' before the name of persons



COLIN: The button "person": do we highlight only personal names, or do we include clear individuals such as "the king of Spain", or should it be "King of Spain".

PATRIZIA: Idem: 'King of Spain', and absolutely yes, include him (as it is). Then it is a problem of the database to give him a name, like here: We saw the king and queen during Mass. It's from Mozart's letters.

If you hover with the mouse over king and queen you read 'Ferdinando IV di Borbone Napoli, born 12/01/1752, died 04/01/1825' and Maria Caroltta (Carolina) d'Asburgo-Lorena - born 13/08/1752, died 07/09/1814.



COLIN: Another example would be the Lord Protector, which I have marked up as The Lord Protector (The <person>Lord Protector </person>against).

PATRIZIA: Sorry: be careful again with spaces within the tags.

This is being used as a name, despite being a title (or is it an occupation). Once we have the occupation button, would we mark this up in preference as an occupation, or is it both a name and an occupation and requires double markup? If it requires double markup, is there a syntax which requires one to come within the others? (I don't see any brackets or other syntactic like devices being generated in the HTML code)

No, I would say that this is only a way to refer to the person. It is not the same case of

<person>Charles Anquestil</person>, <profession>Mariner</profession> and <profession>Gunner</profession>

In this case 'mariner' is a predicate of the person, and indicates his profession (i.e.: I find your mark correct), with the already mentioned warning that I would mark 'gunner' with a tag 'role' or 'title', as you find best (after all I find that 'role' would better suit).
If I may take again an example from Mozart's letter, see this:

http://letters.mozartways.com/index.php?lang=eng&theme=people&name=1200&alpha=C

You see that Antonio Colonna Branciforte is mentioned in letter 171. If you click on 'View', you will see the term 'Cardinal' highlighted within the text of letter 171.

'Cardinal' is a way to refer to the 'person' Antonio Colonna Branciforte, who had, in time,different roles:

1. Assistente al Soglio Pontificio (27/02/1754)
2. Nunzio Apostolico a Venezia (02/04/1754)
3. Cardinale (06/04/1766)
4. Legato Pontificio a Bologna (1769 — 1775)



COLIN: Would "King of Spain" be marked up both as a person and "Spain" as a place, or is this a clear case of where the transcriber is distinguishing person and place?

PATRIZIA: No, again: King of Spain is only a way to refer to that person.Semantically, it does not have reference to a place.



COLIN: In the case of "place" I have assumed (i.e. made an editorial policy assumption" that compound places will be marked up twice, e.g. "of Callice in ffrance" is marked up as "of <place>Callice</place> in <place>ffrance</place>".

PATRIZIA:: See my previous email. The document mentions one place, not two. France is again an attribute of Callice. We can tag the countries, if we find it useful. (This is not needed to find out where is Callice, because this is done with the database, but it could be useful in all the cases where ONLY a country is mentioned. Giovanni, what do you think?)



COLIN: If we were wanting to use markup (converted into TEI compliant markup) to drive searches such as How many legal depositions refer to French war ships (as opposed to merchant ships) in the (English) channel, you would need to know that the "the golden Eagle of Callice" and "the Royal Mary" were french ships and were ships of war, and that Callice (presumably Calais), and other ports such as Dinkirk (with its spelling variants) had been grouped for the purpose of the search under the broader term (English) "Channel"

PATRIZIA: I agree that these will likely be normal searches that people will perform on such a website, but this is the task of the database, in my opinion. The lack of a relational structure (versus a textual search) is exactly what do not allow you to correctly answer to these questions. I sent yesterday a paper to Giovanni that very well highlights the problem. Consider the wonderful Old Bailey Proceedings. Despite being a fantastic resource, there is no way to say if two persons with the same name are two persons or a double reference to the same person. This is why for Mozart we use a relational database, and why I think that we should do the same for Marinelives.



COLIN: I am also clear (for discussion) that we need the transcribers first to create a "clean" transcription, without using any category buttons, and that this then needs review and perfection and signoff, before the categories are added. Otherwise palaeographical questions and learning will get all mixed up with category editorial policy, and I think that is a very big ask for the first four weeks of transcription post training. So I think team facilitators, to the extent that they are acting as page editors, will need to take two passes at each page. For discussion please.

PATRIZIA: Colin: this is probably very, very wise. In Italy we say that Rome was not made in a day. It's already so troubling to get through the paleographical issues, that probably you would better concentrate on this. We can easily go back to these discussions after 3/4 weeks, no?


COLIN: I also understand Giovanni's (and Patrizia's) points about the weakness of Scripto being the absence of a cumulative aggregating robust database, which accumulates the category markup, so that at any one time you can inspect all places, people, etc input by trasncribers to date. However, that is clearly not soluble for this project. It does mean that any "nice to have" but NOT essential functionality such as mapping out places referred to in the transcribed documents would have to be done as a one off piecfe of analysis almost certainly on a sample basis. Playing with mapped data looks like it can only really happen after we have that robust database, which will be generated by the markup/analysis team

PATRIZIA:Totally agree.



COLIN: Finally, it is clear to me that we should move any planned end of project conference from the tentative end of January to a tentative end of March or early April, to give us LOTS of wiggle time on the database/markup/analysis stage of the project

These are my thoughts for what they are worth. I am very keen to hear back as soon as possible from Stuart and Charlene, and also to get William and Jill's input.

I am also going to contact Dr Elaine Murphy today, who I am meeting in Cambridge on Thursday 6th September, to ask her if she would be prepared to take a look at our Scripto modifications and to comment.

Patrizia is leaving today for Austria, and will only be back properly into the conversation Friday week. She and Giovanni are clearly already establishing a very productive relationship, and will jointly be leading the database/semantic markup/analysis team, and will be joint facilitators of that team. I am looking to the two of them, once Patrizia is back, to produce an outline plan for their team, showing broad milestones, and very importantly providing an estimate of the numbers of associates they need on their team, with what sorts of prior experience and training.



Giovanni to Patrizia, Charlene, Colin, Jill, William; 30/08/12: 12:05


On Thu, Aug 30, 2012 at 12:05 AM, Giovanni Colavizza <giovannicolavizza@gmail.com> wrote:

A few general points follow, together with specific choices I propose, open for discussion.

After collecting suggestions by everybody, we're approaching a choice I somewhat feel to justify. We have many assumptions to be done, which make this project both stimulating and constrained: lack of time, crowdsourcing, lack of money and necessity for open source solutions are the most important for me, dealing with our IT solution. I think we agreed that we'll have a semi-diplomatic transcription, leaving most of document format out. At the same time we plan to build an as rich as possible database for further use of this data. Given all this, we needed an easy to use environment, knowing that the tuning of markup and actual buildup of database will follow, and be done by different people than transcribers. Yet, it's on their shoulders the burden to highlight meaningful information while transcribing, as we all agreed it's not possible to avoid this. Afterwards, we'll adjust the markup to be TEI compliant (with personalizations) and build a database, for further use.

Scripto is very probably going to be our solution, not because it's perfect (far from), but because I fail to see any other free and easy to use framework that gives us the same amount of customization (even if not native). What it really lacks, and is a concern for me, are two things: the possibility to build a richer database as we go (with all ships, persons, etc.), and support for consistency checks to facilitate transcribers and reduce mistakes (the two things come together). Yet I fail to see comparable solutions providing these things, so we'll need extra care by transcribers and facilitators, as well as subsequent work on the database. So, I'm aware it's not a perfect solution, but I hope it'll do.

Let's get specific on what I plan to add to basic Scripto (please also refer to Patrizia's mail attached below):

- pretty much everything Charlene required. I'm just for now refraining from text align buttons (if we do those, many other format properties should be added: we might perhaps leave all this out and just assume the image siding the transcription during future use? This is what Patrizia suggest, and I endorse it). Would it be perhaps preferable to just add a tag for marginal/inserted text, as Charlene proposed?

- everything Patrizia suggests (see below), but: abbreviations, dasheds and ampersands, for which I suggest we stick to Charlene's policy, and deal with TEI requirements later on.

- special things that will be tagged are: persons, places, ships, commodities, currencies, weights, measures, titles/professions. Should we add behaviours/values? We'll need some rules of thumb for transcribers if so, but I think it might be worth a try. Anything else? Please, tell me.

Patrizia idea to track ownership brings us to a mayor problem, I think: unique identifiers for things and relations between data. Ideally we should strive for a good transcription and a prosopographical framework built upon it, to allow navigation throughout information. This is huge work, ideally done with semantic web tools, which we won't use now. My question is, how and to what extent can we approach and move towards such an end, with our possibilities? We do not have a growing database to work upon, which would allow us to uniquely identify things and add data/relations to them. As every transcriber work, s/he might find mentions of a person already named in other documents, and we won't be able to assume the two are the same (or are different even if with the same name). Some work to this end can and will be done on database building, but I think we'll still have to end up with unlinked data. While I plan to hear more from Patrizia about this, and follow her judgement considering her ample experience, I would like to hear from you all.

Also, any other idea to improve Scripto, or even a last minute new fancy solution, is welcome. All the best,



Patrizia to Giovanni, Colin: 29/08/12; XXX


2012/8/29 Patrizia Rebulla <patrizia.rebulla@gmail.com>

Hi Colin, hi Giovanni,

We all talked together today. Only to sum it up before leaving (Giovanni already had a longer and more detailed mail on technicalities), I'd suggest for the moment the following

  • add a menu that opens the letters with diacritic and accents (Giovanni got my list)


  • add a button for


- currencies (20 shillings)
- weights (ship of 50 tons; 8 oz of butter...)
- measures (Dunkirk is 20 miles from Southampton; this ship is 18 feet long)
- title (this is beyond occupation: in time of privateering, when a war started somebody who was a fisherman could become the master of a ship; somebody who is a merchant could become an alderman
- ownership (the ship belonged to...; then I know we will have the problem of the proportion of ownership: one third, the half. I'd like to save it, but we may discuss it upon my return. I have some idea)

  • I'd limit to the minimum the diplomatic aspects of the papers, and will concentrate more in the content. What I proposed to Giovanni is to use this tags, which are TEI standards:


- misspellings can be marked with the <sic> tag. This is used with the ‘corr’ attribute to reassure the reader that this is not a faulty transcription, e.g.: but rather shaken by their <sic corr=”nervewracking”>nerveracking</sic>

- abbreviations, if we want to expand them (for readers' sake), have this tag altho

  • I saw some notes about dasheds and ampersands. If we want to leave them as they are, in TEI they are replaced using ISO values. Dashes are characterised as — and ampersands as &


  • Finally, as said today with Giovanni, we need to save dates not only in their narrative form (8 June 1654) but also in their date_form (08/06/1654). This allow us to display the papers in chronological order.


Sorry for this short and untidy note. I'm really struggling with time. Hope this helps.



Queries to team leaders

Colin




Comment Box


Comments