Tools: Team Two

From MarineLives
Revision as of 13:20, July 5, 2016 by ColinGreenstreet (Talk | contribs) (Bibliography)

Jump to: navigation, search

Team two: tailored search of historical documents

Team summary


We will explore how historians approach historical search when they are looking for people, places and dates. We will look at search engines employed by archives and libraries such as the National Archives and the British Library, at search tools provided by digital resources such as British History online and at federated search tools such as Connected Histories. We will look at search tools, glossaries, and lookup tables on the MarineLives wiki. Our focus will be on how historians really work, and on how technology can be used to speed up and make more effective the day-to-day task of historical search.

An explicit goal of team two will be to understand the semantic properties of the MarineLives semantic media wiki. This wiki was implemented in May 2015 by one of our volunteers, Rowan Beentje. With four million words of full text, over 10,000 manuscript images and over 20,000 pages, improved search will have a dramatic impact for all users of the wiki. A number of potential semantic search plug-ins exist, and we would like our volunteers to specify the functionality our users need and to explore the appropriate semantic search solution.



Historians - we need your input


The MarineLives Digital Pop Up Lab team would like to interview historians about their use of different search engines.

We are seeking to develop a detailed understanding of the types of searches historians perform and wish to perform, and the extent to which current search engines meet their needs.

We would like to explore with historians the specific functionality they would like to see for the MarineLives search engine.

All input will go into our development of a new semantic based search engine for the MarineLives wiki.

The five search engines we are interested in are (1) The National Archives, Kew: Discovery search engine (2) British Library catalogue (3) British History Online (4) Connected Histories and (5) the MarineLives wiki

Please contact us if you would like to do a fifteen minute Skype interview with one of our team



Historians use of historical search engines - planned interviews

By date


  • Wednesday, June 29th 2016 @ 11 a.m.: John Levin, PhD candidate, University of Sussex, @anterotesis [COMPLETED - INTERVIEW NOTE TO FOLLOW]


  • Friday July 1st 2016 @ 9.30 a.m. (UK time): Thierry Daunois, University of Lorraine [COMPLETED - INTERVIEW NOTE TO FOLLOW]


  • Friday July 1st 2016 @ 10.45 a.m.: Harriet Richardson, architectural historian, @FredaWorley [COMPLETED - INTERVIEW NOTE TO FOLLOW]



  • Wednesday, July 6th 2016 @ 10 a.m.: Dr Andy Burn, Postdoctoral Research Assistant on 'Social Relations and Everyday Life in Englad 1500-1640', University of Durham, @aj_burn


  • Wednesday July 6th 2016 @ 2 pm: Dr Cathryn Pearce, marine historian, @CathrynPearce


  • Friday July 8th 2016 @ 10 a.m.: Dr James Brown, Research Associate, Intoxicants Project, University of Sheffield, @intoxproject


Date and time to be decided




Historical search interview guide


We would like to ask early modern historians of all types (social, economic, political, material, cultural, maritime) the following questions in a fifteen minute Skype interview:

(1) What is your experience of historical research?

- level of study (undergraduate, masters, PhD candidate, post-doctoral, early career scholar, established researcher)?
- types of historical research performed?

(2) What search engines do you use to discover and access historical data?

- Google, archival search engines, library search engines, specialised search engines
- Do you use:
-- English National Archives Discovery search engine?
-- British Library catalogue search engines? [If so, which]
-- British History Online?
-- Connected Histories?

(3) What hardware do you use to access distorical data?

-- personal: laptop, desktop, I-pad, mobile phone?
-- institutional: library terminal?

(4) Choose one historical search engine that you find particularly useful, and talk us through how you use it:

-- Do you structure a search strategy in advance of starting top work with the search engine?
-- Do you write down a search strategy?
-- Do you identify key words or phrases to search for?

(5) Staying with the historical search engine you have chosen in your response to (4), talk us through how you would:

-- research a person?
-- research a place?

(6) How do you capture the results of your searches?

-- do you create a word document or Excel spreadsheet in which to store the searches?
-- do you keep a record of the search terms which generated your results?
-- do you store the search results and extracts of the records they refer to in the same word document or Excel spreadsheet?

(7) How do you sequence your searches and your use of search engines?

-- do you work methodically through predefined search terms in one search engine and then move on to the next one?
-- do you have multiple search engines open at the same time and move backwards and forwards between them in response to specific research results?

(8) Have you performed searches on the MarineLives wiki?

-- If so, please tell us about your experience of searching the MarineLives wiki?
-- What tools have you used to find data on the wiki (vertical sidebar; lists of deponents; thematic pages; search box in top right hand corner of each wiki page)?
-- What improvements would you like to see to MarineLives wiki searchability and discoverability?



Interview guide with Thierry Daunois


(1) Listing key semantic design issues as you see them coming to the MarineLives semantic media wiki as a non-historian, but with a deep and wide understanding of what other scientists and scholars have done with a range of semantic media wikis

(2) Issues which may emerge from haveing adoped a Semantic Forms approach to structuring pages if we now wish to introduce semantic tagging of the full text transcription in the "Transcription" field

(3) Good reference projects for semantic media wikis that you recommend we study in our Lab over the next ten weeks, including people we might speak to on the technical side about the design and implementation of those semantic media wikis

Interview notes on historical search strategies and use of search engines



Louise Falcini, PhD candidate, University of Reading


Skype interview: Monday June 27th 2016 @ 11 a.m.
Interviewee: Louise Falcini ‏@louisefalcini Academia.edu profile
Interviewer: Colin Greenstreet @Marinelivesorg Academia.edu profile

Q1

Louise Falcini is a highly experienced archivist, who used to work at the London Metropolitan Archives. She is now at the write-up stage of her PhD dissertation, which she has embarked on as a mature student. Her dissertation concerns the London poor in the long C18th. Her PhD dissertation supervisor is Professor Tim Hitchcock.

Q2

Louise approaches historical search with a very clear sense of what she is looking for, and has a good sense of the location and nature of the archives, which are likely to yield fruitful information.

Her use of search engines for research is focussed, with very limited use of serendipitous or wild card approaches.

Asked to name the four or five main search engines she uses, Louise named without prompting (in the order of their naming): (1) The National Archives Discovery Search Engine - simple not "advanced" screen (2) London Metropolitan Archives online catalogue (3) A2A [Archives 2 Archives], now part of the National Archives Discovery search engine (4) London Lives. On prompting, Louise named four additional search engines (1) Connected Histories (2) British History Online (3) Old Bailey Online, which she gets access through London Lives (4) British Library.

Louise makes considerable use of TNA Discovery Engine, LMA online catalogue, and London Lives. She rarely uses the British Library Online Catalogues.

Q3

Louise works primarily at home from her desktop computer. She has two screens, one larger than the other. She does not use libary or archival terminals for search, though she will use them to order up material.

Q4

Louise identifies sets of potential search terms to support a specific research strategy and will then implement them in the appropriate search engine or online catalogue.

She makes use of "Sounds like" features, where they are available. An example of this is for London Lives. See London Lives information on search engine functionality

Louise makes considerable use of Google Books when searching for secondary sources. She also uses Early English Books Online and Echo.

Locating London's Past is useful for its detailed maps John Roque 1746 map of London. It also has useful functionality allowing the mapping of key words

Q5

This question was not discussed.

Q6

Louise captures the results of her searches in the SW package Zotero.

Some search engines allow autosave of data into Zotero, for example Old Bailey Online. See Organising Your Research With Reference Management Tools (e.g. Zotero)

She regularly shares her Zotero files with her PhD supervisor, who will comment on new potential sources and use of existing sources, but does not directly annotate the Zotero files.

Q7

This question was not discussed.

Q8

Louise has not performed research oriented searches on the MarineLives wiki, since the mid-C17th falls outside the period of interest for her PhD dissertation. She was therefore unable to comment on search functionality and user experience of the MarineLives wiki.



John Levin, PhD candidate, University of Sussex


[ADD DATA]

Q1

John has a Master's degrees in history and digital humanities from King's College London and from University College London. He is currently a fulltime PhD candidate at the University of Suusex, where he is supervised by Professor Tim Hitchcock and XXXX.

Q2

Q3

Q4

Q5

Q6

Q7

Q8


Thierry Daunois, University of Lorraine


(1) Listing key semantic design issues as you see them coming to the MarineLives semantic media wiki as a non-historian, but with a deep and wide understanding of what other scientists and scholars have done with a range of semantic media wikis

(2) Issues which may emerge from haveing adoped a Semantic Forms approach to structuring pages if we now wish to introduce semantic tagging of the full text transcription in the "Transcription" field

(3) Good reference projects for semantic media wikis that you recommend we study in our Lab over the next ten weeks, including people we might speak to on the technical side about the design and implementation of those semantic media wikis

[ADD DATA]

Q1

Thierry's undergraduate education was at the University of Reims. He subsequently started his own marketings ervices business serving small French businesses, which he ran for five years. He then returned to higher education and studied for a Masters in Business Intelligence the University of Nancy, graduating in 2009. Since then, Thierry has been employed at the University of Lorraine in Nancy working for the Central Direction responsible for University partnering with external bodies. In that role, Thierry has acquired considerable experience of semantic media wiki technology being used by different scientific partners. He is currently considering embarking on a PhD programme in digital humanities, which would start in September 2016.

Q2

Q3

Q4

Q5

Q6

Q7

Q8


Harriet Richardson, Survey of London


[ADD DATA]

Q1

Harriet's undergraduate studies in English were at the University of Nottingham. She then studied for an MLitt in architectural history at the University of Saint Andrews. Following graduate studies Harriet worked on a study of Scottish hispitals, and later moved to London, where she joined the Survey of London.

Q2

Q3

Q4

Q5

Q6

Q7

Q8



Dr Jenni Hyde



Q1


Dr Jenni Hyde was awarded her PhD from Manchester in 2015. She is a part-time lecturer at Liverpool Hope University and an honorary member of the History Department at Lancaster University.
She specialises in Early Modern ballads, usually C16th, with a second research interest in protestant martyrs under Queen Mary
Jenni's research treats ballads as an historical and musical resource, drawing on her earlier professional career as a music teacher
She is based near Preston in Lancashire.

Q2


Jenni named three top search engines, which she uses in her research.
(1) Google (2) Early English Books Online (EEBO) (3) State Papers Online
When working on a new research topic, Jenni will start with Google. If she gets too many results for a given search in Google, she will use the same search times but filtered through Google Books and/or Google Scholar and/or Google Images.
On further prompting, Jenni mentioned using (4) British History Online (5) The National Archives Discovery - Advanced Search.
The content of Old Bailey Online is too late for Jenni's work.

Q3


Jenni works predominantly from home, where she uses a laptop, without a second screen. She also has a tablet, which she uses in archives and libraries to take notes
When working on her laptop, Jenni will have up to ten windows open at any one time. She will close windows when the screen gets too full.

Q4


We took Google as an example to explore how Jenni works with a search engine.
A typical search would be to look in Google for a line of a ballad. A line of a ballad could contain up to ten words, all of which Jenni will enter into the search box, constrained by inverted commas. She will remove the inverted comma constraint if nothing turns up on the search. She will also try alternative spellings - sometimes Google will come up with alternative spelling through its algroithms, but it is nevertheless worth adjusting the search terms for alternative spellings as well.
Jenni works with the default Google setting of ten results per page, and will work through up to five or six screens of ten results per page, before trying something new. Her rule of thumb for initiating a new search is when the results on a given results page of ten results start to look very obviously not relevant.
Easy accessibility is important in determining which search engines Jenni uses - she mentioned that Google Scholar was less visible as a Google service than Google, Google Books and Google Images, which she found irritating

State Papers Online have quite good fuzzy search functionality, but it is hard to browse results around a given result.
For example, if the term "Thomas Cromwell" produced an interesting result in SPOnline, Jenni will probably want to look at entries for one or two weeks before and after that entry, but it is not easy to get to those results and to browse them.

Q5


We worked through Early English Books Online as a second example of historical search.
Jenni makes great use of this resource, since it contains key content. However, she finds the resouce slightly annoying to use, and criticised some of its functionality
She stated that "all the content is there, but if don't put in exactly the right terms you don't get easily to the content".
An example of this need for exactness is search by bibliographical reference. The inputted reference needs to follow the exact format recognised by the EEBO search engine, including spacing between letters and numbers
Jenni also criticised the EEBO "sounds-like" function, stating that key word search claims to have a sounds-like function, but that for her purposes it was "not fuzzy enough", and that the degree of fuzziness could not be controlled.

Jenni would like to see two specific improvements to EEBO search:
(1) More "flexibility on results"
(2) Semantic date search, with the EEBO search engine recognising dates contained in the full text of documents and being able to produce results which reference those date, rather than simply the recognised date of the document. In the case of C16th documents in EEBO, many documents are simply dated 1500-1599. As a result it is hard to date constrain results, and Jenni gets too many results.

Q6


Jenni's preferred storage method for her research results are large Word documents. In the past she has tried using EverNote, which takes text off a page and stores it in a notebook, but has not found this tool very useful and has discontinued using it.

When starting work on a new research theme (e.g. writing an article on the Pilgrimage of Grace) she will create a new Word document named "Pilgrimage of Grace". She will then paste research results and document extracts into this Word document. These documents can become as large as 1000 Word pages. She will then search within the Word document using standard Word search tools when synthesising the material. She tends to keep the Word document going until she has finished her work on that theme, rather than creating multiple Word documents. At the end of a specific piece of work, she will archive these Word documents, sometimes combining several Word documents into larger Word documents for archiving purposes.

Jenni uses Excel and Access to manipulate data from these Word research documents, but will not tend to enter data directly from his searches into Excel and/or Access, preferring to go through the Word document intermediate stage.

Q7


See answer to Q4 regarding use of Google , and then filtering results via Google Book, Google Scholar and Google Images

Q8


Jenni has not been an active searcher on the MarineLives wiki for her own research, so we did not pursue this question.


Dr Cathryn Pearce


[ADD DATA]

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8



Dr James Brown, University of Sheffield


[ADD DATA]

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8



Dr Andy Burn, University of Durham


[ADD DATA]

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8



Search Engine Examples with Semantic Aspects


Semantic search features offered by Google



Facet-based search in biomedical domain: Example: Semedico



Cluster-search: Example: Carrot2



Natural Language Processing facilitated search: Example: EasyAsk for commercial websites

Semantic search on MarineLives Semantic Media Wiki


The MarineLives wiki is a Semantic Media Wiki. For technological background see Rowan Beentje, 'Tech Talk', June 27th 2016

The semantic features of the wiki offer the ability to specify semantic searches.


Search screens




Useful links


National Archives advanced search
British Library catalogue search
British History online search
Connected Histories search
Semantic MediaWiki



Semantic Media Wiki Examples



MarineLives Wiki


- 5 categories
-- Indexes (31 members)
-- Languages (6 members)
-- Pages (10,025 members)
-- Pages with broken file links (2 members)
-- Volumes (81 members)
- Why do pages not appear under the index? e.g. Dutch tagged pages; Spanish tagged pages?
- http://www.marinelives.org/wiki/Dutch
- http://www.marinelives.org/wiki/Spanish - displays in groups of 200, without any ability to determine the number of results per page



Other historical content


Listings of SMW-based history sites
- Engineering and Technology History Wiki
-- Contains articles, cagegories, sub-categories
-- diplays pages and media in a specified category
- [Logic Museum]
-- Front page includes list of new articles; articles currently being worked on; main categories; other useful categories Category:Manuscripts, Catgeory:Projects, Category:Websites, Catgory:Philosphers, Category:Journals, Category:Encycolopedias, Category:Societies
- Clicking on a category lists all pages in a category


Semantic mediawiki capabilities


Semantic extension: Google Maps format
Semantic extension: OpenLayers format
Semantic extension: KML format


Bibliography


Aula, Anne, Rehan M. Khan, Zhiwei Guan, 'How does Search Behaviour Change as Search Becomes More Difficult, CHI, Atlanta, Georgia, April 10-15, 2010, viewed 05/07/2016
- Presents data from investigation of search failures in small-scale lab studies as well as search engine-logs, looking at signals of user frustration
- Citations of interest include:
-- Aula, A, Majaranta, P., and Räihä, K.-J. (2005) Rye tracking reveals the personal styles for search result evaluation. Proveedings of Human-Computer Interaction - INTERACT 2005, 1058-1961.
-- Brand-Gruwel, S., Wopereis, I. and Vermetten, Y. (2005) Information problem solving by experts and novices: analysis of a complex cognitive skill. Computers in Human Behaviour, 21, 487-508.
-- Jansen, B. and Spink, A. (2006) How are we searching the world wide web? A comparison of nine search engine transaction logs. Information Processing and Management, 42, 248-263.

Grimes, Seth (January 21, 2010). "Breakthrough Analysis: Two + Nine Types of Semantic Search". InformationWeek, viewed 02/07/2016

Haase, Peter, Daniel Herzig, Mark Musen, Thanh Tran, 'Semantic Wiki Search', in L. Aroyo et al. (eds.), ESWC 2009, LNCS 5554, pp.445-460, 2009
- Proposes a search interface to combine the expressiveness and capabilities of structured queries with the simplicity known from keyword interfaces and faceteted browsing, which are easier to handle for lay end users (p.446)
- Breaks search process into (1) Articulation of the information need (2) Query interpretation using keyword translation (3) Result presentation and refinement
- Proposes a workflow in which (1) Information needs are articulated as key words (2) user queries are translated into structured conjunctive queries (3) conjunctive queries are presented to the user and can be refined by the user, following the paradigm of faceted browsing, adding or removing facets to broaden or narrow the query.
- MediaWiki supports the hierarchical orgamnisation of categories, and Semantic Media Wiki (SMW) can be configured to interpret this as an OWL class hierarchy
- SMW also has a special property of "subproperty of" that can be used for property hierarchies
- Conjunctive queries fall into three categories (1) Entity Queries (2) Fact Queries (3) General Conjunctive Queries
- Entity Queries correspon to a wiki page
- Fact Queries are queries for concrete properties of particular objects, and correspond to one (or more) statements on a page, but not to a page
- General Conjunctive Queries are queries which allow the retrieval of multiple examples of a general conjunctive enquiry, which may multiple statements on many pages

Ruiz-Montiel, Manuela, Joaquin J. Molina-Castro, Jose F. Aldana-Montes, 'TasTicWiki: A Semantic Wiki with Content Recommendation', Conference: 5th Workshop on Semantic Wikis- Linking Data and People; 7th Extended Semantic Web Conference. Hersonissos, Crete, Greece, June 2010. pp. 31-40

Solomou, Georgia, Dimitrios Koutsomitropoulos, 'Towards an evaluation of semantic searching in digital repositories: a DSpace case-study', EWlectronic Library and Information Systems, Vol. 49 No. 1, 2015, pp.63-90
- Looks at semantic search in DSpace as an example of a popular content-management system (other examples being EPrints, Digital Commons, CONTENTdm, ETB-db, and Fedora)

Veja, Cornelia, Christoph Schindler, Basil Ell, Semantic MediaWiki Based Virtual Research Environments: The Case of Semantic Collaborative Corpora Analysis, Barcelona, 28-30.10.2015


Terminology


Exploratory Search

Wikipedia article: Exploratory Search

Semantic Search

Wikipedia article: Semantic Search
- Navigational search vs. Research search
- Research search = providing search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. The user is attempting to locate a number of documents, which in their entirety will provide the desired information.
- Research search is similar to Exploratiory search