Tools: Team Two

Index of volume(s)
Index of subject(s)

Team two: tailored search of historical documents

1 Team summary
2 Historians - we need your input
3 Interview notes on historical search strategies and use of search engines
4 Standard search box on the MarineLives wiki
5 Search Engine Examples with Semantic Aspects
6 Semantic search on MarineLives Semantic Media Wiki
7 Search screens
8 Useful links
9 Semantic Media Wiki Examples
10 Other historical content
11 Semantic mediawiki capabilities
12 Bibliography
13 Terminology

Team summary

Interviews with Historians

We are conducting a series of interviews with professional historians about their use of electronic search in support of their research strategies

Recent interviews include social historians Dr Andy Burn (Durham University) and Dr James Brown (University of Sheffield) and maritime historian Dr Cathryn Pearce (University of Greenwich)

Dr Andy Burn, Durham University

Dr Andy Burn is a post-doctoral fellow at the University of Durham. His current research concerns "Social Relations and Everyday Life in England, 1500-1640", a Leverhulme-funded project led by Professor Andy Wood.^[1] The first year of this project involved extensive research across England in local record offices and archives in which Andy examined mainly legal documents generated by Church and National courts. Andy's research is now moving more online and will mine State Papers (using State Papers Online) as well as Early English Books Online (EEBO), plus local records accessed electronically. See academic profile

Click here to read the interview with Dr Andy Burn.

Dr James Brown, University of Sheffield

Dr James Brown is based at the University of Sheffield. He is affiliated both to the Sheffield HRI Digital group and to the Sheffield history faculty. He is currently one of two research associates on the project 'Intoxicants and Early Modernity: England, 1580-1740' (ESRC; PI: Professor Phil Withington). James completed his PhD at the University of Warwick on Inns, Taverns and Alehouses in Early Modern Southampton in 2008. Between 2009 and 2013 he was project coordinator and then digital project manager for 'Cultures of Knowledge: Networking the Republic of Letters, 1550-1750' at the University of Oxford (Mellon Foundation; PI: Professor Howard Hotson), overseeing (inter alia) the development of its union catalogue of sixteenth-, seventeenth-, and eighteenth-century correspondence, Early Modern Letters Online.

Click here to read the interview with Dr James Brown

Dr Cathryn Pearce, Visiting Lecturer, University of Greenwich

Dr Cathryn Pearce is an American maritime historian, living and working in the South West of England, who has studied and worked in Alaska, Canada and England. She was an active transcriber in the MarineLives project team back in 2012, when the project was first established. She received her BA in History from the University of Alaska Fairbanks, and her MA in British and Maritime history from the University of Victoria in British Columbia. She received her doctorate in maritime studies from the University of Greenwich. She edits the peer reviewed online journal Troze for the National Maritime Museum, Cornwall.

Cathryn's current research project is on life saving and coastal communities. This project centres on the private physical manuscript archive of the Shipwrecked Mariners Society. Cathryn has imaged the minute books of the Society, together with associated materials, and is now transcribing the material and exploring the background of the many individuals mentioned therein. The archive is located in Chichester and is a purely a paper archive with no electronic finding aids or search engine.

Click here to read the interview with Dr Cathryn Pearce

We will explore how historians approach historical search when they are looking for people, places and dates. We will look at search engines employed by archives and libraries such as the National Archives and the British Library, at search tools provided by digital resources such as British History online and at federated search tools such as Connected Histories. We will look at search tools, glossaries, and lookup tables on the MarineLives wiki. Our focus will be on how historians really work, and on how technology can be used to speed up and make more effective the day-to-day task of historical search.

An explicit goal of team two will be to understand the semantic properties of the MarineLives semantic media wiki. This wiki was implemented in May 2015 by one of our volunteers, Rowan Beentje. With four million words of full text, over 10,000 manuscript images and over 20,000 pages, improved search will have a dramatic impact for all users of the wiki. A number of potential semantic search plug-ins exist, and we would like our volunteers to specify the functionality our users need and to explore the appropriate semantic search solution.

Historians - we need your input

The MarineLives Digital Pop Up Lab team would like to interview historians about their use of different search engines.

We are seeking to develop a detailed understanding of the types of searches historians perform and wish to perform, and the extent to which current search engines meet their needs.

We would like to explore with historians the specific functionality they would like to see for the MarineLives search engine.

All input will go into our development of a new semantic based search engine for the MarineLives wiki.

The five search engines we are interested in are (1) The National Archives, Kew: Discovery search engine (2) British Library catalogue (3) British History Online (4) Connected Histories and (5) the MarineLives wiki

Please contact us if you would like to do a fifteen minute Skype interview with one of our team

Historians use of historical search engines - planned interviews

By date

Monday June 27th 2016 @ 11 a.m.: Louise Falcini, PhD candidate, ‏@louisefalcini COMPLETED - SEE INTERVIEW NOTE

Wednesday, June 29th 2016 @ 11 a.m.: John Levin, PhD candidate, University of Sussex, @anterotesis [COMPLETED - INTERVIEW NOTE TO FOLLOW]

Friday July 1st 2016 @ 9.30 a.m. (UK time): Thierry Daunois, University of Lorraine [COMPLETED - INTERVIEW NOTE TO FOLLOW]

Friday July 1st 2016 @ 10.45 a.m.: Harriet Richardson, architectural historian, @FredaWorley [COMPLETED - INTERVIEW NOTE TO FOLLOW]

Monday, July 4th @ 10 a.m.: Dr Jenni Hyde, lecturer, Liverpool Hope University, @wallyberry COMPLETED - SEE INTERVIEW NOTE

Wednesday, July 6th 2016 @ 10 a.m.: Dr Andy Burn, Postdoctoral Research Assistant on 'Social Relations and Everyday Life in England 1500-1640', University of Durham, @aj_burn COMPLETED - SEE INTERVIEW NOTE

Wednesday July 6th 2016 @ 2 pm: Dr Cathryn Pearce, marine historian, @CathrynPearce COMPLETED - SEE INTERVIEW NOTE

Friday July 8th 2016 @ 10 a.m.: Dr James Brown, Research Associate, Intoxicants Project, University of Sheffield, @intoxproject COMPLETED - SEE INTERVIEW NOTE

Date and time to be decided

Professor James Daybell, [Day and time TBC] @JamesDaybell

Dr Nina Lamal , [Day and time TBC] @NinaLamal

Historical search interview guide

We would like to ask early modern historians of all types (social, economic, political, material, cultural, maritime) the following questions in a fifteen minute Skype interview:

(1) What is your experience of historical research?

- level of study (undergraduate, masters, PhD candidate, post-doctoral, early career scholar, established researcher)?
- types of historical research performed?

(2) What search engines do you use to discover and access historical data?

- Google, archival search engines, library search engines, specialised search engines
- Do you use:
-- English National Archives Discovery search engine?
-- British Library catalogue search engines? [If so, which]
-- British History Online?
-- Connected Histories?

(3) What hardware do you use to access distorical data?

-- personal: laptop, desktop, I-pad, mobile phone?
-- institutional: library terminal?

(4) Choose one historical search engine that you find particularly useful, and talk us through how you use it:

-- Do you structure a search strategy in advance of starting top work with the search engine?
-- Do you write down a search strategy?
-- Do you identify key words or phrases to search for?

(5) Staying with the historical search engine you have chosen in your response to (4), talk us through how you would:

-- research a person?
-- research a place?

(6) How do you capture the results of your searches?

-- do you create a word document or Excel spreadsheet in which to store the searches?
-- do you keep a record of the search terms which generated your results?
-- do you store the search results and extracts of the records they refer to in the same word document or Excel spreadsheet?

(7) How do you sequence your searches and your use of search engines?

-- do you work methodically through predefined search terms in one search engine and then move on to the next one?
-- do you have multiple search engines open at the same time and move backwards and forwards between them in response to specific research results?

(8) Have you performed searches on the MarineLives wiki?

-- If so, please tell us about your experience of searching the MarineLives wiki?
-- What tools have you used to find data on the wiki (vertical sidebar; lists of deponents; thematic pages; search box in top right hand corner of each wiki page)?
-- What improvements would you like to see to MarineLives wiki searchability and discoverability?

Interview guide with Thierry Daunois

(1) Listing key semantic design issues as you see them coming to the MarineLives semantic media wiki as a non-historian, but with a deep and wide understanding of what other scientists and scholars have done with a range of semantic media wikis

(2) Issues which may emerge from haveing adoped a Semantic Forms approach to structuring pages if we now wish to introduce semantic tagging of the full text transcription in the "Transcription" field

(3) Good reference projects for semantic media wikis that you recommend we study in our Lab over the next ten weeks, including people we might speak to on the technical side about the design and implementation of those semantic media wikis

Interview notes on historical search strategies and use of search engines

Louise Falcini, PhD candidate, University of Reading

Skype interview: Monday June 27th 2016 @ 11 a.m.
Interviewee: Louise Falcini ‏@louisefalcini Academia.edu profile
Interviewer: Colin Greenstreet @Marinelivesorg Academia.edu profile

Q1

Louise Falcini is a highly experienced archivist, who used to work at the London Metropolitan Archives. She is now at the write-up stage of her PhD dissertation, which she has embarked on as a mature student. Her dissertation concerns the London poor in the long C18th. Her PhD dissertation supervisor is Professor Tim Hitchcock.

Q2

Louise approaches historical search with a very clear sense of what she is looking for, and has a good sense of the location and nature of the archives, which are likely to yield fruitful information.

Her use of search engines for research is focussed, with very limited use of serendipitous or wild card approaches.

Asked to name the four or five main search engines she uses, Louise named without prompting (in the order of their naming): (1) The National Archives Discovery Search Engine - simple not "advanced" screen (2) London Metropolitan Archives online catalogue (3) A2A [Archives 2 Archives], now part of the National Archives Discovery search engine (4) London Lives. On prompting, Louise named four additional search engines (1) Connected Histories (2) British History Online (3) Old Bailey Online, which she gets access through London Lives (4) British Library.

Louise makes considerable use of TNA Discovery Engine, LMA online catalogue, and London Lives. She rarely uses the British Library Online Catalogues.

Q3

Louise works primarily at home from her desktop computer. She has two screens, one larger than the other. She does not use libary or archival terminals for search, though she will use them to order up material.

Q4

Louise identifies sets of potential search terms to support a specific research strategy and will then implement them in the appropriate search engine or online catalogue.

She makes use of "Sounds like" features, where they are available. An example of this is for London Lives. See London Lives information on search engine functionality

Louise makes considerable use of Google Books when searching for secondary sources. She also uses Early English Books Online and Echo.

Locating London's Past is useful for its detailed maps John Roque 1746 map of London. It also has useful functionality allowing the mapping of key words

Q5

This question was not discussed.

Q6

Louise captures the results of her searches in the SW package Zotero.

Some search engines allow autosave of data into Zotero, for example Old Bailey Online. See Organising Your Research With Reference Management Tools (e.g. Zotero)

She regularly shares her Zotero files with her PhD supervisor, who will comment on new potential sources and use of existing sources, but does not directly annotate the Zotero files.

Q7

This question was not discussed.

Q8

Louise has not performed research oriented searches on the MarineLives wiki, since the mid-C17th falls outside the period of interest for her PhD dissertation. She was therefore unable to comment on search functionality and user experience of the MarineLives wiki.

John Levin, PhD candidate, University of Sussex

[ADD DATA]

Q1

John has a Master's degrees in history and digital humanities from King's College London and from University College London. He is currently a fulltime PhD candidate at the University of Suusex, where he is supervised by Professor Tim Hitchcock and XXXX.

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Thierry Daunois, University of Lorraine

(1) Listing key semantic design issues as you see them coming to the MarineLives semantic media wiki as a non-historian, but with a deep and wide understanding of what other scientists and scholars have done with a range of semantic media wikis

(2) Issues which may emerge from haveing adoped a Semantic Forms approach to structuring pages if we now wish to introduce semantic tagging of the full text transcription in the "Transcription" field

(3) Good reference projects for semantic media wikis that you recommend we study in our Lab over the next ten weeks, including people we might speak to on the technical side about the design and implementation of those semantic media wikis

[ADD DATA]

Q1

Thierry's undergraduate education was at the Reims Business School (today Neoma Business School). He subsequently started his own marketing services business serving small French businesses, which he ran for five years. He then returned to higher education and studied for a Masters in Business Intelligence the University of Nancy, graduating in 2009. Since then, Thierry has been employed at the University of Lorraine in Nancy working for the Central Direction responsible for University partnering with external bodies. In that role, Thierry has acquired considerable experience of semantic media wiki technology being used by different scientific partners. He is currently considering embarking on a PhD programme in digital humanities, which would start in September 2016.

Q2

Q3

Q4

Q5

Q6

Definitely : a semantic wiki ! I took part in the development of Wicri, a semantic wikis' network, which counts about 170 wikis, either on a geographic scale (from a regional to a continental scale, from Wicri/Lorraine up to Wicri/Europe), or on a thematic one (Wicri/Archéologie, Wicri/Urban soil, Wicri/Psychologie, Wicri/Agronomie...). Wikis are definitely a very good collaborative way of capitalizing data, enough flexible to adapt to the real life of projects : you often start a project with an idea of what you could reach, but need to re-orient the whole thing when facing real life...

Q7

Q8

Harriet Richardson, Survey of London

[ADD DATA]

Q1

Harriet's undergraduate studies in English were at the University of Nottingham. She then studied for an MLitt in architectural history at the University of Saint Andrews. Following graduate studies Harriet worked on a study of Scottish hispitals, and later moved to London, where she joined the Survey of London.

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Dr Jenni Hyde

Q1

Dr Jenni Hyde was awarded her PhD from Manchester in 2015. She is a part-time lecturer at Liverpool Hope University and an honorary member of the History Department at Lancaster University.
She specialises in Early Modern ballads, usually C16th, with a second research interest in protestant martyrs under Queen Mary
Jenni's research treats ballads as an historical and musical resource, drawing on her earlier professional career as a music teacher
She is based near Preston in Lancashire.

Q2

Jenni named three top search engines, which she uses in her research.
(1) Google (2) Early English Books Online (EEBO) (3) State Papers Online
When working on a new research topic, Jenni will start with Google. If she gets too many results for a given search in Google, she will use the same search times but filtered through Google Books and/or Google Scholar and/or Google Images.
On further prompting, Jenni mentioned using (4) British History Online (5) The National Archives Discovery - Advanced Search.
The content of Old Bailey Online is too late for Jenni's work.

Q3

Jenni works predominantly from home, where she uses a laptop, without a second screen. She also has a tablet, which she uses in archives and libraries to take notes
When working on her laptop, Jenni will have up to ten windows open at any one time. She will close windows when the screen gets too full.

Q4

We took Google as an example to explore how Jenni works with a search engine.
A typical search would be to look in Google for a line of a ballad. A line of a ballad could contain up to ten words, all of which Jenni will enter into the search box, constrained by inverted commas. She will remove the inverted comma constraint if nothing turns up on the search. She will also try alternative spellings - sometimes Google will come up with alternative spelling through its algroithms, but it is nevertheless worth adjusting the search terms for alternative spellings as well.
Jenni works with the default Google setting of ten results per page, and will work through up to five or six screens of ten results per page, before trying something new. Her rule of thumb for initiating a new search is when the results on a given results page of ten results start to look very obviously not relevant.
Easy accessibility is important in determining which search engines Jenni uses - she mentioned that Google Scholar was less visible as a Google service than Google, Google Books and Google Images, which she found irritating

State Papers Online have quite good fuzzy search functionality, but it is hard to browse results around a given result.
For example, if the term "Thomas Cromwell" produced an interesting result in SPOnline, Jenni will probably want to look at entries for one or two weeks before and after that entry, but it is not easy to get to those results and to browse them.

Q5

We worked through Early English Books Online as a second example of historical search.
Jenni makes great use of this resource, since it contains key content. However, she finds the resouce slightly annoying to use, and criticised some of its functionality
She stated that "all the content is there, but if don't put in exactly the right terms you don't get easily to the content".
An example of this need for exactness is search by bibliographical reference. The inputted reference needs to follow the exact format recognised by the EEBO search engine, including spacing between letters and numbers
Jenni also criticised the EEBO "sounds-like" function, stating that key word search claims to have a sounds-like function, but that for her purposes it was "not fuzzy enough", and that the degree of fuzziness could not be controlled.

Jenni would like to see two specific improvements to EEBO search:
(1) More "flexibility on results"
(2) Semantic date search, with the EEBO search engine recognising dates contained in the full text of documents and being able to produce results which reference those date, rather than simply the recognised date of the document. In the case of C16th documents in EEBO, many documents are simply dated 1500-1599. As a result it is hard to date constrain results, and Jenni gets too many results.

Q6

Jenni's preferred storage method for her research results are large Word documents. In the past she has tried using EverNote, which takes text off a page and stores it in a notebook, but has not found this tool very useful and has discontinued using it.

When starting work on a new research theme (e.g. writing an article on the Pilgrimage of Grace) she will create a new Word document named "Pilgrimage of Grace". She will then paste research results and document extracts into this Word document. These documents can become as large as 1000 Word pages. She will then search within the Word document using standard Word search tools when synthesising the material. She tends to keep the Word document going until she has finished her work on that theme, rather than creating multiple Word documents. At the end of a specific piece of work, she will archive these Word documents, sometimes combining several Word documents into larger Word documents for archiving purposes.

Jenni uses Excel and Access to manipulate data from these Word research documents, but will not tend to enter data directly from his searches into Excel and/or Access, preferring to go through the Word document intermediate stage.

Q7

See answer to Q4 regarding use of Google , and then filtering results via Google Book, Google Scholar and Google Images

Q8

Jenni has not been an active searcher on the MarineLives wiki for her own research, so we did not pursue this question.

Dr Andy Burn, University of Durham

This interview was focussed on interacting with the MarineLives wiki to generate improvement ideas, and so did not address all eight questions in the interview guide

Q1

Dr Andy Burn, Durham University

Dr Andy Burn is a post-doctoral fellow at the University of Durham. His current research concerns "Social Relations and Everyday Life in England, 1500-1640", a Leverhulme-funded project led by Professor Andy Wood.^[2] The first year of this project involved extensive research across England in local record offices and archives in which Andy examined mainly legal documents generated by Church and National courts. Andy's research is now moving more online and will mine State Papers (using State Papers Online) as well as Early English Books Online (EEBO), plus local records accessed electronically. See academic profile

Q2

Andy named unprompted the following five search engines as the main search engines he currently uses.

(1) Google
(2) Google Scholar
(3) National Archives Discovery - Advanced Search
(4) EEBO
(5) State Papers Online

Andy uses Google if he is looking for a specific book. Google is ideal for C19th books, since their full text is available via Google. He uses Google Scholar if he is looking for an article, but commented that accessing Google Scholar is not easy, since the URL is fairly hidden by Google. Andy commented that Google is "quite forgiving", so that a typo in the title of a book or a rough version of the title of a book, will still generate useful results. In contrast to many Google users, who typically use between one and three words in a query, Andy tends to use a significant number of words in his Google queries (ranging between five and ten). He stated that this was due to the specific searches he tends to use Google for - searching for the titles of books and articles.

When looking for archival material, Andy makes use of The National Archives (TNA) Discovery Advanced Search. Typically he will date constrain his TNA search. Often he will also constrain his search to a specific record series, e.g. Exchequer, or Chancery records. He does this by typing in the broad class mark of the record series (e.g. E 134) and then adding the search term or terms he is looking for. Andy tends to be quite specific in the archival series he constrains his searches by, based on good knowledge of the record series and which series are likely to be of use to his research. He does not make use of broader "facetted" type constraints, e.g. "legal", which can be applied using TNA Discovery.

Andy commented that he is using TNA Discovery very differently than he uses Google - in the case of Google it is to get to a specific document, e.g. a book or article, which ideally he will then be able to read on line. In the case of TNA Discovery, he is typically perparing a visit to the physical archive. He gave the example of lining up three hundred physical archival documents to view on a three day trip to TNA in Kew. When using TNA Discovery, Andy will typically constrain the search results just to documents held by TNA (excluding results from other archives).

Q8

Andy has performed a series of searches on the MarineLives wiki when it was in its old format, hosted by wikispot, with the content organised into separate wikis by volume of depositions. He has not searched on MarineLives wiki since its consolidation and expansion in May 2015.

We performed two experimental searches on the MarineLives wiki during the interview, which highlighted a number of issues and improvement ideas.

Experiment: two search tasks using the MarineLives wiki

MarineLives wiki: "price of coal"

Google: "MarineLives" + "price of coal"

We explored together how Andy would approach answereing two specific research questions using the MarineLives wiki:

(a) Task One: Find five ships containing coal from Newcastle
(b) Task two: Find the price of coal

In the case of task one, Andy performed his first search using the words Newcastle and Coal. This search yielded fourteen results, with Andy chosing to click on the link in the results for the wiki page HCA 13/72 f.52v Annotate. Once on this page, Andy used CTRL+F on his keyboard to open a search box for the page and typed in Newcastle. This yielded three uses of Newcastle on the page, but interestibgly all three uses were in the People note section of the page, rather than the transcription of the image on the page. This led Andy to suggest an improvement to MarineLives wiki search functionality whereby a researcher could specify the portion of any page he or she wished to search, e.g. Transcription; Annotations; Metadata

In the case of task two, Andy typed in price of coal without inverted commas. However, this did not yield useful results. At the interviewers suggestion he typed in "price of coal" constrained by inverted commas to force search for the exact phrase. This led to three results, all of which contained the phrase "price of coal". However, a curious feature of the current MarineLives wiki search function is that the snippets displayed in the results page do not highlight the constrained. This can be compared with the superior Google snippets which are produced when a constrained search is performed using "MarineLives" and "price of coal".

MarineLives wiki search improvement ideas

(1) Andy suggested it would be useful to specify sections of the standard wiki pages to search on, e.g. Transcription text only; Annotations only; Metadata only

(2) He is a frequent user of fuzzy search on other search engines and finds this lacking in the MarineLives wiki. He gave the example of searches involving the term Newcastle, in which he would want his search to include name variants such as New Castle and New Castell. This can be done either using fuzzy search functionality or by the use of wild cards such as * or ?. He noted that there are different types of fuzzy search functionality, and that some are better than others. He was not keen on fuzzy search on State Papers Online, and observed more generally that the site was very slow in responding to searches, and that he often found it hard to find a document which he knew existed and had to be in the State Papers Online data repository.

Asked about his use of advanced operators such as AND and OR in queries, Andy stated that he rarely used them, though he did use inverted commas to constrain a phrase.

Asked about his use of natural language questions instead of small numbers of key words in queries, Andy said that he would use natural language questions in Google for a very specific type of enquiry - when he wanted to know the modern place name equivalent of a C16th or C17th place name. He had discovered from his use of Google that a query such as What is the place name [SPECIFY PLACE NAME] called now? would often yield an answer, typically in the form of a geanealogical website page with the old and modern name equivalents

Dr Cathryn Pearce, University of Greenwich

Dr Cathryn Pearce, Visiting Lecturer, University of Greenwich

Q1

Dr Cathryn Pearce is an American maritime historian, living and working in the South West of England, who has studied and worked in Alaska, Canada and England. She was an active transcriber in the MarineLives project team back in 2012, when the project was first established. She received her BA in History from the University of Alaska Fairbanks, and her MA in British and Maritime history from the University of Victoria in British Columbia. She received her doctorate in maritime studies from the University of Greenwich. She edits the peer reviewed online journal Troze for the National Maritime Museum, Cornwall.

Cathryn's current research project is on life saving and coastal communities. This project centres on the private physical manuscript archive of the Shipwrecked Mariners Society. Cathryn has imaged the minute books of the Society, together with associated materials, and is now transcribing the material and exploring the background of the many individuals mentioned therein. The archive is located in Chichester and is a purely a paper archive with no electronic finding aids or search engine.

Q2

Cathryn uses a number of search engines to expand her knowledge of primary materials from the Shipwrecked Mariners Society. Her particular focus is on identifying individuals named in the archive of the Society and to establish links between the individuals.

Cathryn listed the following search engines as one she uses relatively frequently:
(1) Google (including Google Book and Google Scholar)
(2) British History Online
(3) University accessed search engines, e.g. House of Commons parliamentary papers, EEBO, JSTOR
(4) British Newspaper archive (private subscription)
(5) London Lives/Old Bailey (Cathryn praised these two sites for content and functionality)

In the case of Google, Cathryn starts with Google or sometimes with Google Books or Google Scholar. For full text online books she has pre-identified as being for frequent reference, Cathryn stores their URLS as Tabs in her Chrome browser toolbar. Chrome is her current standard browser, having moved from Firefox/Mozilla.

When using Google to explore a specific topic, Cathryn identifies keywords and enters them as a query, initially without quotation marks to allow for some search flexibility. For example, when working on a paper about the Royal Navy and the beginnings of the Shipwrecked Mariners Society, Cathryn has been seeking to identify a number of naval personnel and the links between them. She searched for admiral sir george cockburn, without inverted commas around any or all of the four keywords, to allow for variations such as admiral george cockburn and sir george cockburn. If her searche query is yielding too many results, or if the results are not yielding the right type of material (for example, wrong period), she will refine her query, adjusting the keywords used, and possibly adding inverted commas.

In a different type of Google search, for a specific book, Cathryn will type in all or most of the title of a book to access the book in Google Book.

Cathryn is prepared to look through up to ten pages of results from a specific query, but typically would look through only three or four pages of results.

One piece of semantic functionality which would be greatly valued by Cathryn would be the ability to specify the time period (century or decade) that she is interested in, with Goofle or whichever search engine she is using, being able to detect from the full text of a page or series of pages the time period in which a person or place or object is being mentioned. At the moment, simply adding C19th or nineteenth century to a search string in Google has very little effect in terms of filtering results by time period.

We briefly discussed the search engine behind the House of Commons Parliamentary Papers. Cathryn stated that she hated using this tool, although she needed too. Her main objection was that when looking for a specific act or bill, which she knew existed, she found it hard to get to the data. The search engine behind the content produced too many wrong sources, requiring patience to sift through the irrelevant results.

We even more briefly discussed Early English Books Online, which Cathryn uses, but had not used recently. From her memory of using the resource she stated that keyword search works fairly well.

We briefly touched on the used of advanced operators in formulating queries, in addition to the use of inverted commas to constrain a search. Cathryn noted that she used wild cards (e.g. *) in queries - for example ship wreck* to get to ship wreckers and ship wrecking as well as ship wrecks.

Q3

Cathryn work mainly from home, using a desktop and a laptop simultaneously. This gives her two screens on which to display open windows, though she would like to add an additional flatscreen monitor for the desktop.

Q4

See answer to Q3.

Q5

See answer to Q3.

Q6

Cathryn uses a number of software tools to capture her research notes, and differentiates between the types of notes she keeps in each tool.

She uses Word documents for synthesis, but would include pasted URLs and some pasted text. She always records the keywords which she uses to generate specific results. She uses EverNote to capture web pages directly, describing it as a way of capturing "what you see". She also uses Zotero for the capture and management of bibliographical information. Although a long term user of Zotero, Cathryn admitted that she did not use all its features.

Cathryn described in detail how she organises her academic research and writing workflow. She had a clear vision for integrating her research with her writing, and described using the software package Scrivener in its Microsoft version as her workflow management and integrating tool.

When Cathryn is working in the text of an academic article she can import material into her Scrivener document management software from research files she keeps in Word and from other research files she keeps in EverNote. Cathryn describes Scrivener as allowing her to pull in all her research with separate screens for research and writing. This enables her to view both her notes and her written text at the same time. Cathryn first heard about Scrivener five years ago, when it was onely available for the Mac operating system. She has now been using the software for four years, since a Microsoft version was made available. Cathryn praised the software for allowing her to change the structure of documents whilst they are being written from paragraph level on up to the top level organisation of her material. Text can be exported from Scrivener on completion into Word for final formatting, or into rich text format, if this is preferred. There is also a Cork board section, which enables the moving around of subtitles.

Cathryn does not integrate her Zotero bibliographical data files directly into Scrivener, but instead cuts and pastes. However, it appears that code has been developed in 2015 to do this integration^[3]

Despite her significant sophistication and interest in software tools for workflow management and document creation, Cathryn admitted to still occasionally making use of physical index cards.

Q7

Cathryn described herself as being well organised, with a good general idea of what she is going to be doing in terms of research goals and tasks each day. She organises her research around research questions and around the specific section of an academic paper she might be working on at a particular time. The precise search engines and electronic resources she may use during a day will vary according to the need of the task, and she is flexible in moving between engines and resources.

Q8

Cathryn is familiar with MarineLives wiki content from her involvement as a volunteer transcriber in 2012, when MarineLives used the wikihost named wikispot. She has not yet used the MarineLives integrated wiki, which we launched in May 2015, for personal research. She plans to try out some searches on the new wiki and then will get back to us with ideas for improving wiki search functionality and discoverability.

Bibliography

Jessica Jewell, 5 reasons to write your thesis in Scrivener, blog article, November 11, 2013
Zotero Forum: Plugins: Scrivener and Zotero Integration, latest entry: August 14th 2015

Dr James Brown, University of Sheffield

Q1

Dr James Brown is based at the University of Sheffield, where he has an office. He is affiliated both to the Sheffield HRI Digital group and to the Sheffield history faculty. See academic profile. He is currently one of two research associates on the project 'Intoxicants and Early Modernity: England, 1580-1740' (ESRC; PI: Professor Phil Withington). James completed his PhD at the University of Warwick on Inns, Taverns and Alehouses in Early Modern Southampton in 2008. Between 2009 and 2013 he was project coordinator and then digital project manager for 'Cultures of Knowledge: Networking the Republic of Letters, 1550-1750' at the University of Oxford (Mellon Foundation; PI: Professor Howard Hotson), overseeing (inter alia) the development of its union catalogue of sixteenth-, seventeenth-, and eighteenth-century correspondence, Early Modern Letters Online.

Q2

The focus of the intoxicants project is heavily towards primary sources. These primary sources fall into four categories: (1) Port books (2) Church court depositions and quarter sessions examinations (witness statements) (3) Licences, orders and presentments (4) Probate inventories. All four categories are in manuscript, rather than printed text (although a fifth strand, led by Co-I Dr Angela McShane, is exploring objects and artefacts in museum collections). The project is looking at two cultural provinces: (1) The North West of England (Cheshire and South Lancashire) (2) East Anglia. The period covered is broadly 1580-1740. James focusses on the North West of England, and a second postdoctoral research fellow, Tim Wales, works on East Anglian sources. A main output of the project will be a database of early modern intoxicants and intoxication, which will federate the datasets arising from the analysis of the primary source materials

Asked unprompted to name four or five search engines which James uses heavily, James named (1) TNA Discovery (2) Lancashire Record Office catalogue (LanCat) (3) Cheshire Archives and Local Studies Catalogue (4) FindMyPast.

When prompted, James stated that he used Google to a limited extent, which was a consequence of his focus on discovering manuscripts for physical inspection at physical archives. Google was not good for identifying manuscripts and their locations

James prefers to access the electronic catalogues of the Lancashire Record Office and the Cheshire Archives and Local Studies Office directly, rather than accessing them through TNA's Discovery engine (which now incorporates the earlier A2A service for search of local record office metadata). The reason for this is the superior ability to browse metadata electronically directly in the local record office portals compared with TNA Discovery.

His work using online search engines is typically in preparation for an archive trip. This is similar to the search approach prior to archival visits described by Dr Andy Burn (University of Durham) in an earlier interview. James purpose in working with online search engines is to prepare long lists of documents to be viewed at various physical archives.

James praised the functionality of both the Lancashire Record Office catalogue (LanCat) and the Cheshire Archives and Local Studies Catalogue. Specifically, he stated that they had superior capabilities for narrowing down searches to specific types of record collection compared to TNA Discovery. In the case of the Cheshire catalogue, James was able to go straight to church court papers and then to browse them effectively and efficiently.

James had clear views on what makes for a user friendly search engine. He strongly prefers drop down menus when choosing parameters by which to narrow his searches. If you are not particularly familiar with a record series, a blank box or space requiring a precise series number can be alarming! Asked about his use of date restriction as a parameter by which to narrow searches, James stated that TNA Discovery had considerably better date restriction capabilities than the Lancashire and Cheshire search engines. However, in his work with the Lancashire and Cheshire electronic catalogues, he was interested in long time series of documents, so precise date restriction was not of great importance for his research purposes.

The crucial functionality James looked for, and valued in the case of the Lancashire portal, was the ability to find a document of interest and then to browse around it, or alternatively to chose a geographical area and time period and then to browse for material from that point.

Cheshire Archives & Local Studies online browser - Diocese of Chester example. Click on plus button 'C', then on plus button on '5', then on plus on each year to get full list of cases, organised by place

We worked through a case example of James exploring Church court material in the Cheshire archives. He would start by narrowing his search to the diocese of Cheshire, and would then find Church court papers in the series EDC 5. Clicking on the high level metadata for these records would reveal the full run of papers. These papers have been catalogued in considerable detail by John Addy, who has organised his catalogue by date and place. The catalogue, which is electronically available, goes down to case level metadata for plaintiff and defendant, together with an abstract of each case. The existence of this catalogue enables systematic and structured searching, and could be a model, James suggested, for future MarineLives wiki data structuring and browsing/search functionality.

Working in the Cheshire catalogue browser, it is possible to see the metadata organised by years. A button click will then reveal all cases in a specific year organised by place. Expanding the data for a particular place will then reveal metadata for plaintiff and defendant, together with a case abstract. The very clear document hierachy is a great strength of this particular document series, and has been well enabled technically. James contrasted his use of browsing to discover and record cases he wanted to view subsequently in the physical archive with the alternative of using search.

James suggested that search was useful to him for different sorts of research question, giving the example of thematic research looking for topics like "ale", "beer" and "wine". In this case the ability to use search effectively would be greatly enhanced if there had been semantic mark-up of the themes, or if alternativly there had been analytical work down by prior researchers on those topics and the work recorded in the metadata, keywords or in case abstracts. In the absence of that work it would be necessary to work through large quantities of physical documents to search for topical material.

Q3

University of Sheffield HRI designed data creation tool: Screen Shot One - Deposition entry for Chester Quarter Session

University of Sheffield HRI designed data creation tool: Screen Shot Two - Deposition entry for Chester Quarter Session

University of Sheffield HRI designed data creation tool: Screen Shot Three - Data modelling environment

James works out of his office at the University of Sheffield and at regional and local physical archives. When working in his office his hardware consists of a MacBook and large second screen. He uses the large screen mainly for transcription.

He described a stuctured workflow, with preparatory work done in the office using electronic browsing and search to create an extremely large Excel spreadsheet of all references of interest. For example, identifying relevant Port Books in, TNA Discovery Catalogue. Interestingly, James syated that the Excel spreadsheet was a superior way to browse metadata compared with the TNA Discovery tool itself. At the physical archive, initial annotations would be taken into Excel, and images would be taken of the manuscript material. Ultimately archival references, transcriptions and other metadata are inputted into a bespoke system developed by the University of Sheffield HRI Digital group for the Intoxicants project. This system facilitated data entry, and had useful functionality for the monitoring of workflow. James also mentioned the software tool BaseCamp, which he described as a dedicated project management system.

The existence of the bespoke system meant that James did not use Word, EverNote, Zotero or Scrivener, which are various software tools mentioned by other interviewees when they described their research workflows.

Q4

N/A

Q5

N/A

Q6

N/A

Q7

Will be launching database at end of this year - will contain transcriptions. Two step process for legal depositions - not complete transcriptions, are modernising. Will publish metdata. Doing event modelling - modelling the intoxicant related events and then mapping and modeeling. Not publishing images. Have tens of thousands of image. Interested in analysis and viusalisation. Not an editorial project or edition.

N/A

Q8

James congratulated the MarineLives project team and volunteers for creating "an amazing resource". He is familiar with the MarineLives wiki, and has both browsed and searched on the wiki. He commented that he felt he and the Intoxicants team had only started to scratch the potential in the wiki for their own work.

James had a number of specific and practical suggestions to improve both the look and feel of the wiki, its browseability and its search ability.

(1) He suggested that the main page of the wiki needed to lay out its stall more clearly and more powerfully. Currently it had a lot of fascinating material, which could draw casual browsers in, but he recommended cutting the size of the front page and reprioritising the material. James had an immediate impact with this statement, and we have already re-designed the front page, and have hopefully put into practice his suggestions, by introducing a float box for the history and achievements of the project, and another float box recognising the contribution of our many volunteers and helpers. Despite his comments regarding the front page, James praised the organisation of the wiki, as evidenced in the vertical side bar on every wiki page, which stresses the nested hierachy of different document types on which the wiki is built.

(2) He suggested a more systematic approach to accessing and searching metadata for cases and deponents, building on his own experience of working with C16th and C17th court records, and stressing the benefits of clear hierachical metadata and abstracts. James has given a very useful impetus here to the work being done by Colin Greenstreet and Thierry Daunois in team Two of the Digital Pop Up Lab on tailored and semantic search, which we hope to show James for comment later this summer.

(3) James emphasised the power of picking out the core information on the High Court of Admiralty deponents - age, occupation, residence, as well name - and facilitating a more structured sifting and sorting of this data. We explored several volume level listings which the MarineLives project team have created for deponents, organised by folio, by name, by geography, by occupation and by age. An example of these listings can be seen for the volume of Admiralty Court depositions HCA 13/72 for the years 1656 to 1658:

HCA 13/72 Finding Aids

Alphabetical - HCA 13/72 deponents listed alphabetically

By Age - HCA 13/72 deponents listed by age

By Geography - HCA 13/72 deponents listed by geography

By Occupation - HCA 13/72 deponents listed by occupation

R Lygon, Map of Barbados, 1657

(4) He encouraged the MarineLives team to look at ways to code geographical data within the wiki and to visualise such data, and more generally to look at data visualisation options

(5) He suggested that it would be useful to have structured search forms, which led to a disucssion of the potential to exploit the semantic capabilities of the MarineLives wiki, which we have neglected.

(6) He praised the use of full text transcriptions and the high qual;ity supplementary material, such as the material on the Silver Ships.

Standard search box on the MarineLives wiki

The standard search box in the top Right Hand corner of each MarineLives wiki pages searches as follows:^[4]

- It searches all pages on the wiki with some restrictions.
- The article content is searched in its raw (wikitext) form - i.e., it searches the text that appears in the edit box when you click "edit", not the rendered page. This means that content coming from an included template will not be picked up, but the target of piped links will be.
- The search is not case-sensitive, so 'MediaWiki', 'mediawiki' and 'MEDIAWIKI' all give the same result.
- The search functionality can be considered to operate on whole words, separated by spaces or other punctuation marks. So if your search term includes the word 'book', the results will not include pages that only have the word 'books' or 'booklet'. And if your search term includes the term 'inter', the results will not include pages that only have the word 'international', but they may include pages that have the term 'inter-national'.
- The results will only include pages that contain all the words in your search.
- You can search for a phrase using double quotes. A phrase can be considered to consist of whole words (case-insensitive), so the phrase 'Prime Minister' will not be found by a search for "ime Min", but it will be found by a search for "pRIME mINISTER".

Search Engine Examples with Semantic Aspects

Semantic search features offered by Google

Facet-based search in biomedical domain: Example: Semedico

Cluster-search: Example: Carrot2

Natural Language Processing facilitated search: Example: EasyAsk for commercial websites

Semantic search on MarineLives Semantic Media Wiki

The MarineLives wiki is a Semantic Media Wiki. For technological background see Rowan Beentje, 'Tech Talk', June 27th 2016

The semantic features of the wiki offer the ability to specify semantic searches.

Semantic search query form using MarineLives Special Ask API

Search screens

National Archives - Advanced Search home page

BritishLibrary - Online catalogue access page

British History Online - Home page

Connected Histories - Home page

Useful links

National Archives advanced search
British Library catalogue search
British History online search
Connected Histories search
Semantic MediaWiki

Semantic Media Wiki Examples

Semantic mediawiki capabilities

Semantic extension: Google Maps format
Semantic extension: OpenLayers format
Semantic extension: KML format

Bibliography

Aula, Anne, Rehan M. Khan, Zhiwei Guan, 'How does Search Behaviour Change as Search Becomes More Difficult, CHI, Atlanta, Georgia, April 10-15, 2010, viewed 05/07/2016
- Presents data from investigation of search failures in small-scale lab studies as well as search engine-logs, looking at signals of user frustration
- Citations of interest include:
-- Aula, A, Majaranta, P., and Räihä, K.-J. (2005) Rye tracking reveals the personal styles for search result evaluation. Proveedings of Human-Computer Interaction - INTERACT 2005, 1058-1961.
-- Brand-Gruwel, S., Wopereis, I. and Vermetten, Y. (2005) Information problem solving by experts and novices: analysis of a complex cognitive skill. Computers in Human Behaviour, 21, 487-508.
-- Jansen, B. and Spink, A. (2006) How are we searching the world wide web? A comparison of nine search engine transaction logs. Information Processing and Management, 42, 248-263.

Grimes, Seth (January 21, 2010). "Breakthrough Analysis: Two + Nine Types of Semantic Search". InformationWeek, viewed 02/07/2016

Haase, Peter, Daniel Herzig, Mark Musen, Thanh Tran, 'Semantic Wiki Search', in L. Aroyo et al. (eds.), ESWC 2009, LNCS 5554, pp.445-460, 2009
- Proposes a search interface to combine the expressiveness and capabilities of structured queries with the simplicity known from keyword interfaces and faceteted browsing, which are easier to handle for lay end users (p.446)
- Breaks search process into (1) Articulation of the information need (2) Query interpretation using keyword translation (3) Result presentation and refinement
- Proposes a workflow in which (1) Information needs are articulated as key words (2) user queries are translated into structured conjunctive queries (3) conjunctive queries are presented to the user and can be refined by the user, following the paradigm of faceted browsing, adding or removing facets to broaden or narrow the query.
- MediaWiki supports the hierarchical orgamnisation of categories, and Semantic Media Wiki (SMW) can be configured to interpret this as an OWL class hierarchy
- SMW also has a special property of "subproperty of" that can be used for property hierarchies
- Conjunctive queries fall into three categories (1) Entity Queries (2) Fact Queries (3) General Conjunctive Queries
- Entity Queries correspon to a wiki page
- Fact Queries are queries for concrete properties of particular objects, and correspond to one (or more) statements on a page, but not to a page
- General Conjunctive Queries are queries which allow the retrieval of multiple examples of a general conjunctive enquiry, which may multiple statements on many pages

Ruiz-Montiel, Manuela, Joaquin J. Molina-Castro, Jose F. Aldana-Montes, 'TasTicWiki: A Semantic Wiki with Content Recommendation', Conference: 5th Workshop on Semantic Wikis- Linking Data and People; 7th Extended Semantic Web Conference. Hersonissos, Crete, Greece, June 2010. pp. 31-40

Solomou, Georgia, Dimitrios Koutsomitropoulos, 'Towards an evaluation of semantic searching in digital repositories: a DSpace case-study', EWlectronic Library and Information Systems, Vol. 49 No. 1, 2015, pp.63-90
- Looks at semantic search in DSpace as an example of a popular content-management system (other examples being EPrints, Digital Commons, CONTENTdm, ETB-db, and Fedora)

Veja, Cornelia, Christoph Schindler, Basil Ell, Semantic MediaWiki Based Virtual Research Environments: The Case of Semantic Collaborative Corpora Analysis, Barcelona, 28-30.10.2015

Terminology

Exploratory Search

Wikipedia article: Exploratory Search

Semantic Search

Wikipedia article: Semantic Search
- Navigational search vs. Research search
- Research search = providing search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. The user is attempting to locate a number of documents, which in their entirety will provide the desired information.

- Research search is similar to Exploratiory search

[1] Durham University: Dr Andy Burn profile

[2] Durham University: Dr Andy Burn profile

[3] Zotero Forums: Plugins: Scrivener and Zotero integration

[4] Mediawiki - Help:Searching

[1]

[2]

[3]

[4]

Tools: Team Two

Contents

Team summary

Historians - we need your input

Historians use of historical search engines - planned interviews

Historical search interview guide

Interview guide with Thierry Daunois

Interview notes on historical search strategies and use of search engines

Louise Falcini, PhD candidate, University of Reading

John Levin, PhD candidate, University of Sussex

Thierry Daunois, University of Lorraine

Harriet Richardson, Survey of London

Dr Jenni Hyde

Dr Andy Burn, University of Durham

Dr Cathryn Pearce, University of Greenwich

Dr James Brown, University of Sheffield

Standard search box on the MarineLives wiki

Search Engine Examples with Semantic Aspects

Semantic search on MarineLives Semantic Media Wiki

Search screens

Useful links

Semantic Media Wiki Examples

Other historical content

Semantic mediawiki capabilities

Bibliography

Terminology

Navigation menu

Search