Tools: Team Three

From MarineLives
Revision as of 11:07, June 26, 2016 by ColinGreenstreet (Talk | contribs)

Jump to: navigation, search

Team three: visualisation of historical data

Team summary


We will explore how visualisation techniques can be used by historians for multiple purposes - to improve the discoverability of data, to highlight and analyse linkages in data, and to aid the comprehension of data.

We will undertake an analysis of our own needs as historians and will explore how software designers have approached meeting those needs.

An explicit goal of team three is to understand the visualisation potential of the MarineLives full text corpus and to explore approaches to mining the data for visualisation applications.
We would like to explore the use an off-the-shelf Named Entity Recogniser to detect places, ships and dates, and to visualise the results in multiple ways and for multiple analytical purposes. We would like to compare this automated approach to the generation of tagged data to the hand extraction of geographical and other tagged data. We will build off earlier work done in collaboration with the Department of Informatics at the University of Mannheim.

Team members will have an opportunity to work with, and improve upon, a MarineLives dataset for C17th ship sailing times between ports and dwell time in ports


High Court of Admiralty dataset


High Court of Admiralty 1650s travel time dataset
Long distance travel time - simple visualisation
Short distance travel time - simple visualisation

[ADD DATA]



Visualisation tools


[ADD DATA]



Names Entity Recognisers

Stanford Named Entity Recogniser


"Stanford NER is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION).

Stanford NER is also known as CRFClassifier. The software provides a general implementation of (arbitrary order) linear chain Conditional Random Field (CRF) sequence models. That is, by training your own models on labeled data, you can actually use this code to build sequence models for NER or any other task. (CRF models were pioneered by Lafferty, McCallum, and Pereira (2001); see Sutton and McCallum (2006) or Sutton and McCallum (2010) for more comprehensible introductions.)

The original CRF code is by Jenny Finkel. The feature extractors are by Dan Klein, Christopher Manning, and Jenny Finkel. Much of the documentation and usability is due to Anna Rafferty. More recent code development has been done by various Stanford NLP Group members.

Stanford NER is available for download, licensed under the GNU General Public License (v2 or later). Source is included. The package includes components for command-line invocation (look at the shell scripts and batch files included in the download), running as a server (look at NERServer in the sources jar file), and a Java API (look at the simple examples in the NERDemo.java file included in the download, and then at the javadocs). Stanford NER code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under the full GPL, which allows many free uses."[1]



Stanford Named Entity Tagger


Stanford Named Entity Tagger - Extract from HCA 13/68 f.35r



Useful Links


Natural Language Processing Wikipedia article

Dominique Ritze et al., Named Entities in Court: The MarineLives Corpus (May, 2014)

Colin Greenstreet, 'How long did it take?', The Shipping News blog article, Mat 22, 2014

Stanford Natural Language Processing Group: Software > Stanford Named Entity Recognizer (NER)

[Powerpoint intoduction to Named Entity Recognition]

Online Stanford Named Entity Tagger
  1. Stanford Natural Language Processing Group: Software > Stanford Named Entity Recognizer (NER), viewed 26/06/2016