Team Giovanni/Patrizia

From MarineLives
Revision as of 07:31, August 31, 2012 by ColinGreenstreet (Talk | contribs)

Jump to: navigation, search

Team Giovanni/Patrizia

Team Colin

Editorial history

23/08/12: CSG, created page






Suggested links


Team Colin
Team Jill
Team William

TEI: Text Encoding Initiative
TEI Lite



Tasks for the week



Week commencing 20th August 2012




Week commencing 30th August 2012


- Patrizia, how should we deal with quantities and currencies database wise? <quantity value="hour">6. howers</quantity> OR <quantity value="hour">6</quantity>. howers ? I suggested the latter.
Giovanni:I agree

- We'll need to decide which elements we want in the header, to mimic some of a TEI one. If you manage to give a thought about this, it would be great.
I'm afraid I'm not familiar with the structure of the papers until now. Which are the elements that we want to include?

- One further concern: units of meaning in the text (depositions, cases, etc.). We need to identify them properly I think. Followup: we should probably have one header per document (ie picture), and a separate header for cases, depositions (each case contains many, scattered across different pages). Ideally we'll have some metadata to associate the document with a picture, but we'll also represent the units of meaning of the source, which is paramount.
Patrizia: isn't this something that should mimic the structure of the national archives? (I mean the documentary unit)



Week commencing 3rd September 2012




Useful email records

Patrizia to Giovanni, Colin 30/08/12: 23:25


I tried to catch up some issues, probably in an untidy way. I hope something is useful.
Giovanni, tell me if you read my comments in our area. I'm not clear how it works.
I'll try to make an example of excel file for tomorrow, using one of the transcriptions of Colin.



Patrizia to Giovanni, Colin, Charlene, Stuart, Jill, William: 30/08/12: 23:22


PATRIZIA: Only some thoughts about some (not all) of the questions. As Colin said, I am travelling, and see pages only in a very limited and uncomfortable way. Sorry if some comment is unclear.



COLIN: For example:

The button ship: do we highlight "fortune", or "the fortune", or "the shipp the fortune". Does it matter? Giovanni, the "ship" category is currently not displaying in colour the HTML driven publication of the transcribed page at the bottom (it displays as e.g. "the said shipp < style="color:blue">fortune")

PATRIZIA: Yes, it matters. If you include the article in the tag, everything will be sorted under 'the'. Think how the names are painted on true ships: it's likely 'Fortune', not 'The Fortune.' It's the same problem I already highlighted in my last email about 'said' before the name of persons



COLIN: The button "person": do we highlight only personal names, or do we include clear individuals such as "the king of Spain", or should it be "King of Spain".

PATRIZIA: Idem: 'King of Spain', and absolutely yes, include him (as it is). Then it is a problem of the database to give him a name, like here: We saw the king and queen during Mass. It's from Mozart's letters.



If you hover with the mouse over king and queen you read 'Ferdinando IV di Borbone Napoli, born 12/01/1752, died 04/01/1825' and Maria Caroltta (Carolina) d'Asburgo-Lorena - born 13/08/1752, died 07/09/1814.



COLIN: Another example would be the Lord Protector, which I have marked up as The Lord Protector (The <person>Lord Protector </person>against).

PATRIZIA: Sorry: be careful again with spaces within the tags.

This is being used as a name, despite being a title (or is it an occupation). Once we have the occupation button, would we mark this up in preference as an occupation, or is it both a name and an occupation and requires double markup? If it requires double markup, is there a syntax which requires one to come within the others? (I don't see any brackets or other syntactic like devices being generated in the HTML code)

No, I would say that this is only a way to refer to the person. It is not the same case of



COLIN: <person>Charles Anquestil</person>, <profession>Mariner</profession> and <profession>Gunner</profession>

PATRIZIA: In this case 'mariner' is a predicate of the person, and indicates his profession (i.e.: I find your mark correct), with the already mentioned warning that I would mark 'gunner' with a tag 'role' or 'title', as you find best (after all I find that 'role' would better suit).
If I may take again an example from Mozart's letter, see this:

http://letters.mozartways.com/index.php?lang=eng&theme=people&name=1200&alpha=C

You see that Antonio Colonna Branciforte is mentioned in letter 171. If you click on 'View', you will see the term 'Cardinal' highlighted within the text of letter 171.

'Cardinal' is a way to refer to the 'person' Antonio Colonna Branciforte, who had, in time,different roles:

1. Assistente al Soglio Pontificio (27/02/1754)
2. Nunzio Apostolico a Venezia (02/04/1754)
3. Cardinale (06/04/1766)
4. Legato Pontificio a Bologna (1769 — 1775)



COLIN: Would "King of Spain" be marked up both as a person and "Spain" as a place, or is this a clear case of where the transcriber is distinguishing person and place?

PATRIZIA: No, again: King of Spain is only a way to refer to that person.Semantically, it does not have reference to a place.



COLIN: In the case of "place" I have assumed (i.e. made an editorial policy assumption" that compound places will be marked up twice, e.g. "of Callice in ffrance" is marked up as "of <place>Callice</place> in <place>ffrance</place>".

PATRIZIA:: See my previous email. The document mentions one place, not two. France is again an attribute of Callice. We can tag the countries, if we find it useful. (This is not needed to find out where is Callice, because this is done with the database, but it could be useful in all the cases where ONLY a country is mentioned. Giovanni, what do you think?)



COLIN: If we were wanting to use markup (converted into TEI compliant markup) to drive searches such as How many legal depositions refer to French war ships (as opposed to merchant ships) in the (English) channel, you would need to know that the "the golden Eagle of Callice" and "the Royal Mary" were french ships and were ships of war, and that Callice (presumably Calais), and other ports such as Dinkirk (with its spelling variants) had been grouped for the purpose of the search under the broader term (English) "Channel"

PATRIZIA: I agree that these will likely be normal searches that people will perform on such a website, but this is the task of the database, in my opinion. The lack of a relational structure (versus a textual search) is exactly what do not allow you to correctly answer to these questions. I sent yesterday a paper to Giovanni that very well highlights the problem. Consider the wonderful Old Bailey Proceedings. Despite being a fantastic resource, there is no way to say if two persons with the same name are two persons or a double reference to the same person. This is why for Mozart we use a relational database, and why I think that we should do the same for Marinelives.



COLIN: I am also clear (for discussion) that we need the transcribers first to create a "clean" transcription, without using any category buttons, and that this then needs review and perfection and signoff, before the categories are added. Otherwise palaeographical questions and learning will get all mixed up with category editorial policy, and I think that is a very big ask for the first four weeks of transcription post training. So I think team facilitators, to the extent that they are acting as page editors, will need to take two passes at each page. For discussion please.

PATRIZIA: Colin: this is probably very, very wise. In Italy we say that Rome was not made in a day. It's already so troubling to get through the paleographical issues, that probably you would better concentrate on this. We can easily go back to these discussions after 3/4 weeks, no?


COLIN: I also understand Giovanni's (and Patrizia's) points about the weakness of Scripto being the absence of a cumulative aggregating robust database, which accumulates the category markup, so that at any one time you can inspect all places, people, etc input by trasncribers to date. However, that is clearly not soluble for this project. It does mean that any "nice to have" but NOT essential functionality such as mapping out places referred to in the transcribed documents would have to be done as a one off piecfe of analysis almost certainly on a sample basis. Playing with mapped data looks like it can only really happen after we have that robust database, which will be generated by the markup/analysis team

PATRIZIA:Totally agree.



COLIN: Finally, it is clear to me that we should move any planned end of project conference from the tentative end of January to a tentative end of March or early April, to give us LOTS of wiggle time on the database/markup/analysis stage of the project

These are my thoughts for what they are worth. I am very keen to hear back as soon as possible from Stuart and Charlene, and also to get William and Jill's input.

I am also going to contact Dr Elaine Murphy today, who I am meeting in Cambridge on Thursday 6th September, to ask her if she would be prepared to take a look at our Scripto modifications and to comment.

Patrizia is leaving today for Austria, and will only be back properly into the conversation Friday week. She and Giovanni are clearly already establishing a very productive relationship, and will jointly be leading the database/semantic markup/analysis team, and will be joint facilitators of that team. I am looking to the two of them, once Patrizia is back, to produce an outline plan for their team, showing broad milestones, and very importantly providing an estimate of the numbers of associates they need on their team, with what sorts of prior experience and training.



Giovanni to Patrizia, Charlene, Colin, Jill, William: 30/08/12: 12:05


On Thu, Aug 30, 2012 at 12:05 AM, Giovanni Colavizza <giovannicolavizza@gmail.com> wrote:

A few general points follow, together with specific choices I propose, open for discussion.

After collecting suggestions by everybody, we're approaching a choice I somewhat feel to justify. We have many assumptions to be done, which make this project both stimulating and constrained: lack of time, crowdsourcing, lack of money and necessity for open source solutions are the most important for me, dealing with our IT solution. I think we agreed that we'll have a semi-diplomatic transcription, leaving most of document format out. At the same time we plan to build an as rich as possible database for further use of this data. Given all this, we needed an easy to use environment, knowing that the tuning of markup and actual buildup of database will follow, and be done by different people than transcribers. Yet, it's on their shoulders the burden to highlight meaningful information while transcribing, as we all agreed it's not possible to avoid this. Afterwards, we'll adjust the markup to be TEI compliant (with personalizations) and build a database, for further use.

Scripto is very probably going to be our solution, not because it's perfect (far from), but because I fail to see any other free and easy to use framework that gives us the same amount of customization (even if not native). What it really lacks, and is a concern for me, are two things: the possibility to build a richer database as we go (with all ships, persons, etc.), and support for consistency checks to facilitate transcribers and reduce mistakes (the two things come together). Yet I fail to see comparable solutions providing these things, so we'll need extra care by transcribers and facilitators, as well as subsequent work on the database. So, I'm aware it's not a perfect solution, but I hope it'll do.

Let's get specific on what I plan to add to basic Scripto (please also refer to Patrizia's mail attached below):

- pretty much everything Charlene required. I'm just for now refraining from text align buttons (if we do those, many other format properties should be added: we might perhaps leave all this out and just assume the image siding the transcription during future use? This is what Patrizia suggest, and I endorse it). Would it be perhaps preferable to just add a tag for marginal/inserted text, as Charlene proposed?

- everything Patrizia suggests (see below), but: abbreviations, dasheds and ampersands, for which I suggest we stick to Charlene's policy, and deal with TEI requirements later on.

- special things that will be tagged are: persons, places, ships, commodities, currencies, weights, measures, titles/professions. Should we add behaviours/values? We'll need some rules of thumb for transcribers if so, but I think it might be worth a try. Anything else? Please, tell me.

Patrizia idea to track ownership brings us to a mayor problem, I think: unique identifiers for things and relations between data. Ideally we should strive for a good transcription and a prosopographical framework built upon it, to allow navigation throughout information. This is huge work, ideally done with semantic web tools, which we won't use now. My question is, how and to what extent can we approach and move towards such an end, with our possibilities? We do not have a growing database to work upon, which would allow us to uniquely identify things and add data/relations to them. As every transcriber work, s/he might find mentions of a person already named in other documents, and we won't be able to assume the two are the same (or are different even if with the same name). Some work to this end can and will be done on database building, but I think we'll still have to end up with unlinked data. While I plan to hear more from Patrizia about this, and follow her judgement considering her ample experience, I would like to hear from you all.

Also, any other idea to improve Scripto, or even a last minute new fancy solution, is welcome. All the best,



Patrizia to Giovanni, Colin: 29/08/12: XXX


2012/8/29 Patrizia Rebulla <patrizia.rebulla@gmail.com>

Hi Colin, hi Giovanni,

We all talked together today. Only to sum it up before leaving (Giovanni already had a longer and more detailed mail on technicalities), I'd suggest for the moment the following

  • add a menu that opens the letters with diacritic and accents (Giovanni got my list)


  • add a button for


- currencies (20 shillings)
- weights (ship of 50 tons; 8 oz of butter...)
- measures (Dunkirk is 20 miles from Southampton; this ship is 18 feet long)
- title (this is beyond occupation: in time of privateering, when a war started somebody who was a fisherman could become the master of a ship; somebody who is a merchant could become an alderman
- ownership (the ship belonged to...; then I know we will have the problem of the proportion of ownership: one third, the half. I'd like to save it, but we may discuss it upon my return. I have some idea)

  • I'd limit to the minimum the diplomatic aspects of the papers, and will concentrate more in the content. What I proposed to Giovanni is to use this tags, which are TEI standards:


- misspellings can be marked with the <sic> tag. This is used with the ‘corr’ attribute to reassure the reader that this is not a faulty transcription, e.g.: but rather shaken by their <sic corr=”nervewracking”>nerveracking</sic>

- abbreviations, if we want to expand them (for readers' sake), have this tag altho

  • I saw some notes about dasheds and ampersands. If we want to leave them as they are, in TEI they are replaced using ISO values. Dashes are characterised as — and ampersands as &


  • Finally, as said today with Giovanni, we need to save dates not only in their narrative form (8 June 1654) but also in their date_form (08/06/1654). This allow us to display the papers in chronological order.


Sorry for this short and untidy note. I'm really struggling with time. Hope this helps.



Queries to team leaders

Colin