Difference between revisions of "Tools: Team Two"

From MarineLives
Jump to: navigation, search
(Q4)
(Q5)
Line 277: Line 277:
  
 
====Q5====
 
====Q5====
 +
 +
EEBO.
 +
What like about EEBO - slightly annoying - all the content is there, but if don't put in exactly the right terms, doesn
 +
can search by bibliographical no., but must be EXACT e.g. short title catalogue - including sensitive to spaces
 +
Claims to have a sounds like function on key word search, but for Jenni's purpose not fuzzy enough
 +
What improvements like to see for EEBO
 +
- more flexibility on results
 +
-dating - if don't know date of an item results are often excesive in numbers, would like to be able to restruiict by half century and decade. Things that don't have a date - so many things in C16th are of uncertain date; semantic date search would be helpful
  
 
====Q6====
 
====Q6====

Revision as of 09:20, July 4, 2016

Team two: tailored search of historical documents

Team summary


We will explore how historians approach historical search when they are looking for people, places and dates. We will look at search engines employed by archives and libraries such as the National Archives and the British Library, at search tools provided by digital resources such as British History online and at federated search tools such as Connected Histories. We will look at search tools, glossaries, and lookup tables on the MarineLives wiki. Our focus will be on how historians really work, and on how technology can be used to speed up and make more effective the day-to-day task of historical search.

An explicit goal of team two will be to understand the semantic properties of the MarineLives semantic media wiki. This wiki was implemented in May 2015 by one of our volunteers, Rowan Beentje. With four million words of full text, over 10,000 manuscript images and over 20,000 pages, improved search will have a dramatic impact for all users of the wiki. A number of potential semantic search plug-ins exist, and we would like our volunteers to specify the functionality our users need and to explore the appropriate semantic search solution.



Historians - we need your input


The MarineLives Digital Pop Up Lab team would like to interview historians about their use of different search engines.

We are seeking to develop a detailed understanding of the types of searches historians perform and wish to perform, and the extent to which current search engines meet their needs.

We would like to explore with historians the specific functionality they would like to see for the MarineLives search engine.

All input will go into our development of a new semantic based search engine for the MarineLives wiki.

The five search engines we are interested in are (1) The National Archives, Kew: Discovery search engine (2) British Library catalogue (3) British History Online (4) Connected Histories and (5) the MarineLives wiki

Please contact us if you would like to do a fifteen minute Skype interview with one of our team



Historians use of historical search engines - planned interviews

By date


  • Wednesday, June 29th 2016 @ 11 a.m.: John Levin, PhD candidate, University of Sussex, @anterotesis [COMPLETED - INTERVIEW NOTE TO FOLLOW]


  • Friday July 1st 2016 @ 9.30 a.m. (UK time): Thierry Daunois, University of Lorraine [COMPLETED - INTERVIEW NOTE TO FOLLOW]


  • Friday July 1st 2016 @ 10.45 a.m.: Harriet Richardson, architectural historian, @FredaWorley [COMPLETED - INTERVIEW NOTE TO FOLLOW]


  • Monday, July 4th @ 10 a.m.: Dr Jenny Hyde, lecturer, Liverpool Hope University, @wallyberry


  • Monday July 4th 2016 @ 2 pm: Dr Cathryn Pearce, marine historian, @CathrynPearce [POSTPONED, NEW DATE & TIME TBC]


  • Tuesday July 5th 2016 @ 2 p.m.: Dr James Brown, Research Associate, Intoxicants Project, University of Sheffield, @intoxproject


  • Wednesday, July 6th 2016 @ 10 a.m. : Dr Andy Burn, Postdoctoral Research Assistant on 'Social Relations and Everyday Life in Englad 1500-1640', University of Durham, @aj_burn


Date and time to be decided




Historical search interview guide


We would like to ask early modern historians of all types (social, economic, political, material, cultural, maritime) the following questions in a fifteen minute Skype interview:

(1) What is your experience of historical research?

- level of study (undergraduate, masters, PhD candidate, post-doctoral, early career scholar, established researcher)?
- types of historical research performed?

(2) What search engines do you use to discover and access historical data?

- Google, archival search engines, library search engines, specialised search engines
- Do you use:
-- English National Archives Discovery search engine?
-- British Library catalogue search engines? [If so, which]
-- British History Online?
-- Connected Histories?

(3) What hardware do you use to access distorical data?

-- personal: laptop, desktop, I-pad, mobile phone?
-- institutional: library terminal?

(4) Choose one historical search engine that you find particularly useful, and talk us through how you use it:

-- Do you structure a search strategy in advance of starting top work with the search engine?
-- Do you write down a search strategy?
-- Do you identify key words or phrases to search for?

(5) Staying with the historical search engine you have chosen in your response to (4), talk us through how you would:

-- research a person?
-- research a place?

(6) How do you capture the results of your searches?

-- do you create a word document or Excel spreadsheet in which to store the searches?
-- do you keep a record of the search terms which generated your results?
-- do you store the search results and extracts of the records they refer to in the same word document or Excel spreadsheet?

(7) How do you sequence your searches and your use of search engines?

-- do you work methodically through predefined search terms in one search engine and then move on to the next one?
-- do you have multiple search engines open at the same time and move backwards and forwards between them in response to specific research results?

(8) Have you performed searches on the MarineLives wiki?

-- If so, please tell us about your experience of searching the MarineLives wiki?
-- What tools have you used to find data on the wiki (vertical sidebar; lists of deponents; thematic pages; search box in top right hand corner of each wiki page)?
-- What improvements would you like to see to MarineLives wiki searchability and discoverability?



Interview guide with Thierry Daunois


(1) Listing key semantic design issues as you see them coming to the MarineLives semantic media wiki as a non-historian, but with a deep and wide understanding of what other scientists and scholars have done with a range of semantic media wikis

(2) Issues which may emerge from haveing adoped a Semantic Forms approach to structuring pages if we now wish to introduce semantic tagging of the full text transcription in the "Transcription" field

(3) Good reference projects for semantic media wikis that you recommend we study in our Lab over the next ten weeks, including people we might speak to on the technical side about the design and implementation of those semantic media wikis

Interview notes on historical search strategies and use of search engines



Louise Falcini, PhD candidate, University of Reading


Skype interview: Monday June 27th 2016 @ 11 a.m.
Interviewee: Louise Falcini ‏@louisefalcini Academia.edu profile
Interviewer: Colin Greenstreet @Marinelivesorg Academia.edu profile

Q1

Louise Falcini is a highly experienced archivist, who used to work at the London Metropolitan Archives. She is now at the write-up stage of her PhD dissertation, which she has embarked on as a mature student. Her dissertation concerns the London poor in the long C18th. Her PhD dissertation supervisor is Professor Tim Hitchcock.

Q2

Louise approaches historical search with a very clear sense of what she is looking for, and has a good sense of the location and nature of the archives, which are likely to yield fruitful information.

Her use of search engines for research is focussed, with very limited use of serendipitous or wild card approaches.

Asked to name the four or five main search engines she uses, Louise named without prompting (in the order of their naming): (1) The National Archives Discovery Search Engine - simple not "advanced" screen (2) London Metropolitan Archives online catalogue (3) A2A [Archives 2 Archives], now part of the National Archives Discovery search engine (4) London Lives. On prompting, Louise named four additional search engines (1) Connected Histories (2) British History Online (3) Old Bailey Online, which she gets access through London Lives (4) British Library.

Louise makes considerable use of TNA Discovery Engine, LMA online catalogue, and London Lives. She rarely uses the British Library Online Catalogues.

Q3

Louise works primarily at home from her desktop computer. She has two screens, one larger than the other. She does not use libary or archival terminals for search, though she will use them to order up material.

Q4

Louise identifies sets of potential search terms to support a specific research strategy and will then implement them in the appropriate search engine or online catalogue.

She makes use of "Sounds like" features, where they are available. An example of this is for London Lives. See London Lives information on search engine functionality

Louise makes considerable use of Google Books when searching for secondary sources. She also uses Early English Books Online and Echo.

Locating London's Past is useful for its detailed maps John Roque 1746 map of London. It also has useful functionality allowing the mapping of key words

Q5

This question was not discussed.

Q6

Louise captures the results of her searches in the SW package Zotero.

Some search engines allow autosave of data into Zotero, for example Old Bailey Online. See Organising Your Research With Reference Management Tools (e.g. Zotero)

She regularly shares her Zotero files with her PhD supervisor, who will comment on new potential sources and use of existing sources, but does not directly annotate the Zotero files.

Q7

This question was not discussed.

Q8

Louise has not performed research oriented searches on the MarineLives wiki, since the mid-C17th falls outside the period of interest for her PhD dissertation. She was therefore unable to comment on search functionality and user experience of the MarineLives wiki.



John Levin, PhD candidate, University of Sussex


[ADD DATA]

Q1

John has a Master's degrees in history and digital humanities from King's College London and from University College London. He is currently a fulltime PhD candidate at the University of Suusex, where he is supervised by Professor Tim Hitchcock and XXXX.

Q2

Q3

Q4

Q5

Q6

Q7

Q8


Thierry Daunois, University of Lorraine


(1) Listing key semantic design issues as you see them coming to the MarineLives semantic media wiki as a non-historian, but with a deep and wide understanding of what other scientists and scholars have done with a range of semantic media wikis

(2) Issues which may emerge from haveing adoped a Semantic Forms approach to structuring pages if we now wish to introduce semantic tagging of the full text transcription in the "Transcription" field

(3) Good reference projects for semantic media wikis that you recommend we study in our Lab over the next ten weeks, including people we might speak to on the technical side about the design and implementation of those semantic media wikis

[ADD DATA]

Q1

Thierry's undergraduate education was at the University of Reims. He subsequently started his own marketings ervices business serving small French businesses, which he ran for five years. He then returned to higher education and studied for a Masters in Business Intelligence the University of Nancy, graduating in 2009. Since then, Thierry has been employed at the University of Lorraine in Nancy working for the Central Direction responsible for University partnering with external bodies. In that role, Thierry has acquired considerable experience of semantic media wiki technology being used by different scientific partners. He is currently considering embarking on a PhD programme in digital humanities, which would start in September 2016.

Q2

Q3

Q4

Q5

Q6

Q7

Q8


Harriet Richardson, Survey of London


[ADD DATA]

Q1

Harriet's undergraduate studies in English were at the University of Nottingham. She then studied for an MLitt in architectural history at the University of Saint Andrews. Following graduate studies Harriet worked on a study of Scottish hispitals, and later moved to London, where she joined the Survey of London.

Q2

Q3

Q4

Q5

Q6

Q7

Q8



Dr Jenny Hyde



Q1


Early Modern ballads, usually C16th. Finished PhD at Manchester 2015.
Had an email exchange regarding protestant martyrs sometime ago.
Ballads as a historical and musical resource. Used to be a music teacher.
Lecturer part-time at Liverpool Hope. 5 hours per week. Honorary member of History Department at Lancaster. Based ouside Preston

Q2


Top five searrch engines
Google; EEBO; StatePapersOnline; - the main three
Start on Google, and then filter through Google Books, Book Scholar and Google Imgages
British History Online
Old Baile Online is too late for Jenni's work
TNA - advanced search access

Q3


Work from home; use laptop; also have a tablet - use tablet for notes in archives
Don't have a second screen
Up to ten windows open at any one time. Close them when screen gets too full.

Q4


Searching with Google
e.g. a line of a ballad - up to ten words; constrain with inverted commas - take them off if nothing tunrs up; also try alternative spellings - sometimes Google will come up with alternative spelling
Usually leave at ten results per page
Up to five or six screens until start to look very obviously not relevant
What do with results? Have used EverNote - takes text off page and stores in a note book; tend to paste results into a word document. Tend to have up to 1000 pages of word in one document, and can then search in that word. How name the document - ususally around a theme, e.g. Article on the Pilgramge of Grace working document - tend not to date them, keep gpoing until have finished. May then archive the doc as part as eveything. Use word for keeping sarch records. Use Exceka nd Access to extract results and using them later, extracting them from word.

Example of filters: If too much on Google search, use same search terms in Google Book and Google Scholar - Scholar hard to get to - not on main list

Q5


EEBO.
What like about EEBO - slightly annoying - all the content is there, but if don't put in exactly the right terms, doesn
can search by bibliographical no., but must be EXACT e.g. short title catalogue - including sensitive to spaces
Claims to have a sounds like function on key word search, but for Jenni's purpose not fuzzy enough
What improvements like to see for EEBO
- more flexibility on results
-dating - if don't know date of an item results are often excesive in numbers, would like to be able to restruiict by half century and decade. Things that don't have a date - so many things in C16th are of uncertain date; semantic date search would be helpful

Q6


Q7


Q8


Dr Cathryn Pearce


[ADD DATA]

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8



Dr James Brown, University of Sheffield


[ADD DATA]

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8



Dr Andy Burn, University of Durham


[ADD DATA]

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8



Search Engine Examples with Semantic Aspects


Google Advanced Patent Search

Google Advanced Patent Search



Facet-based search in biomedical domain: Example: Semedico



Cluster-search: Example: Carrot2



Natural Language Processing facilitated search: Example: EasyAsk for commercial websites

Semantic search on MarineLives Semantic Media Wiki


The MarineLives wiki is a Semantic Media Wiki. For technological background see Rowan Beentje, 'Tech Talk', June 27th 2016

The semantic features of the wiki offer the ability to specify semantic searches.


Search screens




Useful links


National Archives advanced search
British Library catalogue search
British History online search
Connected Histories search
Semantic MediaWiki



Semantic Media Wiki Examples



MarineLives Wiki


- 5 categories
-- Indexes (31 members)
-- Languages (6 members)
-- Pages (10,025 members)
-- Pages with broken file links (2 members)
-- Volumes (81 members)
- Why do pages not appear under the index? e.g. Dutch tagged pages; Spanish tagged pages?
- http://www.marinelives.org/wiki/Dutch
- http://www.marinelives.org/wiki/Spanish - displays in groups of 200, without any ability to determine the number of results per page



Other historical content


Listings of SMW-based history sites
- Engineering and Technology History Wiki
-- Contains articles, cagegories, sub-categories
-- diplays pages and media in a specified category
- [Logic Museum]
-- Front page includes list of new articles; articles currently being worked on; main categories; other useful categories Category:Manuscripts, Catgeory:Projects, Category:Websites, Catgory:Philosphers, Category:Journals, Category:Encycolopedias, Category:Societies
- Clicking on a category lists all pages in a category


Semantic mediawiki capabilities


Semantic extension: Google Maps format
Semantic extension: OpenLayers format
Semantic extension: KML format


Bibliography


Grimes, Seth (January 21, 2010). "Breakthrough Analysis: Two + Nine Types of Semantic Search". InformationWeek, viewed 02/07/2016

Haase, Peter, Daniel Herzig, Mark Musen, Thanh Tran, 'Semantic Wiki Search', in L. Aroyo et al. (eds.), ESWC 2009, LNCS 5554, pp.445-460, 2009
- Proposes a search interface to combine the expressiveness and capabilities of structured queries with the simplicity known from keyword interfaces and faceteted browsing, which are easier to handle for lay end users (p.446)
- Breaks search process into (1) Articulation of the information need (2) Query interpretation using keyword translation (3) Result presentation and refinement
- Proposes a workflow in which (1) Information needs are articulated as key words (2) user queries are translated into structured conjunctive queries (3) conjunctive queries are presented to the user and can be refined by the user, following the paradigm of faceted browsing, adding or removing facets to broaden or narrow the query.
- MediaWiki supports the hierarchical orgamnisation of categories, and Semantic Media Wiki (SMW) can be configured to interpret this as an OWL class hierarchy
- SMW also has a special property of "subproperty of" that can be used for property hierarchies
- Conjunctive queries fall into three categories (1) Entity Queries (2) Fact Queries (3) General Conjunctive Queries
- Entity Queries correspon to a wiki page
- Fact Queries are queries for concrete properties of particular objects, and correspond to one (or more) statements on a page, but not to a page
- General Conjunctive Queries are queries which allow the retrieval of multiple examples of a general conjunctive enquiry, which may multiple statements on many pages

Ruiz-Montiel, Manuela, Joaquin J. Molina-Castro, Jose F. Aldana-Montes, 'TasTicWiki: A Semantic Wiki with Content Recommendation', Conference: 5th Workshop on Semantic Wikis- Linking Data and People; 7th Extended Semantic Web Conference. Hersonissos, Crete, Greece, June 2010. pp. 31-40

Solomou, Georgia, Dimitrios Koutsomitropoulos, 'Towards an evaluation of semantic searching in digital repositories: a DSpace case-study', EWlectronic Library and Information Systems, Vol. 49 No. 1, 2015, pp.63-90
- Looks at semantic search in DSpace as an example of a popular content-management system (other examples being EPrints, Digital Commons, CONTENTdm, ETB-db, and Fedora)

Veja, Cornelia, Christoph Schindler, Basil Ell, Semantic MediaWiki Based Virtual Research Environments: The Case of Semantic Collaborative Corpora Analysis, Barcelona, 28-30.10.2015


Terminology


Exploratory Search

Wikipedia article: Exploratory Search

Semantic Search

Wikipedia article: Semantic Search
- Navigational search vs. Research search
- Research search = providing search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. The user is attempting to locate a number of documents, which in their entirety will provide the desired information.
- Research search is similar to Exploratiory search