Difference between revisions of "Tools: Team Two"

Revision as of 13:39, July 6, 2016

Team two: tailored search of historical documents

1 Team summary
2 Historians - we need your input
3 Interview notes on historical search strategies and use of search engines
4 Standard search box on the MarineLives wiki
5 Search Engine Examples with Semantic Aspects
6 Semantic search on MarineLives Semantic Media Wiki
7 Search screens
8 Useful links
9 Semantic Media Wiki Examples
- 9.1 MarineLives Wiki
- 9.2 Other historical content
10 Semantic mediawiki capabilities
11 Bibliography
12 Terminology

Team summary

We will explore how historians approach historical search when they are looking for people, places and dates. We will look at search engines employed by archives and libraries such as the National Archives and the British Library, at search tools provided by digital resources such as British History online and at federated search tools such as Connected Histories. We will look at search tools, glossaries, and lookup tables on the MarineLives wiki. Our focus will be on how historians really work, and on how technology can be used to speed up and make more effective the day-to-day task of historical search.

An explicit goal of team two will be to understand the semantic properties of the MarineLives semantic media wiki. This wiki was implemented in May 2015 by one of our volunteers, Rowan Beentje. With four million words of full text, over 10,000 manuscript images and over 20,000 pages, improved search will have a dramatic impact for all users of the wiki. A number of potential semantic search plug-ins exist, and we would like our volunteers to specify the functionality our users need and to explore the appropriate semantic search solution.

Historians - we need your input

The MarineLives Digital Pop Up Lab team would like to interview historians about their use of different search engines.

We are seeking to develop a detailed understanding of the types of searches historians perform and wish to perform, and the extent to which current search engines meet their needs.

We would like to explore with historians the specific functionality they would like to see for the MarineLives search engine.

All input will go into our development of a new semantic based search engine for the MarineLives wiki.

The five search engines we are interested in are (1) The National Archives, Kew: Discovery search engine (2) British Library catalogue (3) British History Online (4) Connected Histories and (5) the MarineLives wiki

Please contact us if you would like to do a fifteen minute Skype interview with one of our team

Historians use of historical search engines - planned interviews

By date

Monday June 27th 2016 @ 11 a.m.: Louise Falcini, PhD candidate, ‏@louisefalcini COMPLETED - SEE INTERVIEW NOTE

Wednesday, June 29th 2016 @ 11 a.m.: John Levin, PhD candidate, University of Sussex, @anterotesis [COMPLETED - INTERVIEW NOTE TO FOLLOW]

Friday July 1st 2016 @ 9.30 a.m. (UK time): Thierry Daunois, University of Lorraine [COMPLETED - INTERVIEW NOTE TO FOLLOW]

Friday July 1st 2016 @ 10.45 a.m.: Harriet Richardson, architectural historian, @FredaWorley [COMPLETED - INTERVIEW NOTE TO FOLLOW]

Monday, July 4th @ 10 a.m.: Dr Jenni Hyde, lecturer, Liverpool Hope University, @wallyberry COMPLETED - SEE INTERVIEW NOTE

Wednesday, July 6th 2016 @ 10 a.m.: Dr Andy Burn, Postdoctoral Research Assistant on 'Social Relations and Everyday Life in England 1500-1640', University of Durham, @aj_burn COMPLETED - SEE INTERVIEW NOTE

Wednesday July 6th 2016 @ 2 pm: Dr Cathryn Pearce, marine historian, @CathrynPearce

Friday July 8th 2016 @ 10 a.m.: Dr James Brown, Research Associate, Intoxicants Project, University of Sheffield, @intoxproject

Date and time to be decided

Professor James Daybell, [Day and time TBC] @JamesDaybell

Dr Nina Lamal , [Day and time TBC] @NinaLamal

Historical search interview guide

We would like to ask early modern historians of all types (social, economic, political, material, cultural, maritime) the following questions in a fifteen minute Skype interview:

(1) What is your experience of historical research?

- level of study (undergraduate, masters, PhD candidate, post-doctoral, early career scholar, established researcher)?
- types of historical research performed?

(2) What search engines do you use to discover and access historical data?

- Google, archival search engines, library search engines, specialised search engines
- Do you use:
-- English National Archives Discovery search engine?
-- British Library catalogue search engines? [If so, which]
-- British History Online?
-- Connected Histories?

(3) What hardware do you use to access distorical data?

-- personal: laptop, desktop, I-pad, mobile phone?
-- institutional: library terminal?

(4) Choose one historical search engine that you find particularly useful, and talk us through how you use it:

-- Do you structure a search strategy in advance of starting top work with the search engine?
-- Do you write down a search strategy?
-- Do you identify key words or phrases to search for?

(5) Staying with the historical search engine you have chosen in your response to (4), talk us through how you would:

-- research a person?
-- research a place?

(6) How do you capture the results of your searches?

-- do you create a word document or Excel spreadsheet in which to store the searches?
-- do you keep a record of the search terms which generated your results?
-- do you store the search results and extracts of the records they refer to in the same word document or Excel spreadsheet?

(7) How do you sequence your searches and your use of search engines?

-- do you work methodically through predefined search terms in one search engine and then move on to the next one?
-- do you have multiple search engines open at the same time and move backwards and forwards between them in response to specific research results?

(8) Have you performed searches on the MarineLives wiki?

-- If so, please tell us about your experience of searching the MarineLives wiki?
-- What tools have you used to find data on the wiki (vertical sidebar; lists of deponents; thematic pages; search box in top right hand corner of each wiki page)?
-- What improvements would you like to see to MarineLives wiki searchability and discoverability?

Interview guide with Thierry Daunois

(1) Listing key semantic design issues as you see them coming to the MarineLives semantic media wiki as a non-historian, but with a deep and wide understanding of what other scientists and scholars have done with a range of semantic media wikis

(2) Issues which may emerge from haveing adoped a Semantic Forms approach to structuring pages if we now wish to introduce semantic tagging of the full text transcription in the "Transcription" field

(3) Good reference projects for semantic media wikis that you recommend we study in our Lab over the next ten weeks, including people we might speak to on the technical side about the design and implementation of those semantic media wikis

Interview notes on historical search strategies and use of search engines

Louise Falcini, PhD candidate, University of Reading

Skype interview: Monday June 27th 2016 @ 11 a.m.
Interviewee: Louise Falcini ‏@louisefalcini Academia.edu profile
Interviewer: Colin Greenstreet @Marinelivesorg Academia.edu profile

Q1

Louise Falcini is a highly experienced archivist, who used to work at the London Metropolitan Archives. She is now at the write-up stage of her PhD dissertation, which she has embarked on as a mature student. Her dissertation concerns the London poor in the long C18th. Her PhD dissertation supervisor is Professor Tim Hitchcock.

Q2

Louise approaches historical search with a very clear sense of what she is looking for, and has a good sense of the location and nature of the archives, which are likely to yield fruitful information.

Her use of search engines for research is focussed, with very limited use of serendipitous or wild card approaches.

Asked to name the four or five main search engines she uses, Louise named without prompting (in the order of their naming): (1) The National Archives Discovery Search Engine - simple not "advanced" screen (2) London Metropolitan Archives online catalogue (3) A2A [Archives 2 Archives], now part of the National Archives Discovery search engine (4) London Lives. On prompting, Louise named four additional search engines (1) Connected Histories (2) British History Online (3) Old Bailey Online, which she gets access through London Lives (4) British Library.

Louise makes considerable use of TNA Discovery Engine, LMA online catalogue, and London Lives. She rarely uses the British Library Online Catalogues.

Q3

Louise works primarily at home from her desktop computer. She has two screens, one larger than the other. She does not use libary or archival terminals for search, though she will use them to order up material.

Q4

Louise identifies sets of potential search terms to support a specific research strategy and will then implement them in the appropriate search engine or online catalogue.

She makes use of "Sounds like" features, where they are available. An example of this is for London Lives. See London Lives information on search engine functionality

Louise makes considerable use of Google Books when searching for secondary sources. She also uses Early English Books Online and Echo.

Locating London's Past is useful for its detailed maps John Roque 1746 map of London. It also has useful functionality allowing the mapping of key words

Q5

This question was not discussed.

Q6

Louise captures the results of her searches in the SW package Zotero.

Some search engines allow autosave of data into Zotero, for example Old Bailey Online. See Organising Your Research With Reference Management Tools (e.g. Zotero)

She regularly shares her Zotero files with her PhD supervisor, who will comment on new potential sources and use of existing sources, but does not directly annotate the Zotero files.

Q7

This question was not discussed.

Q8

Louise has not performed research oriented searches on the MarineLives wiki, since the mid-C17th falls outside the period of interest for her PhD dissertation. She was therefore unable to comment on search functionality and user experience of the MarineLives wiki.

John Levin, PhD candidate, University of Sussex

[ADD DATA]

Q1

John has a Master's degrees in history and digital humanities from King's College London and from University College London. He is currently a fulltime PhD candidate at the University of Suusex, where he is supervised by Professor Tim Hitchcock and XXXX.

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Thierry Daunois, University of Lorraine

(1) Listing key semantic design issues as you see them coming to the MarineLives semantic media wiki as a non-historian, but with a deep and wide understanding of what other scientists and scholars have done with a range of semantic media wikis

(2) Issues which may emerge from haveing adoped a Semantic Forms approach to structuring pages if we now wish to introduce semantic tagging of the full text transcription in the "Transcription" field

(3) Good reference projects for semantic media wikis that you recommend we study in our Lab over the next ten weeks, including people we might speak to on the technical side about the design and implementation of those semantic media wikis

[ADD DATA]

Q1

Thierry's undergraduate education was at the University of Reims. He subsequently started his own marketings ervices business serving small French businesses, which he ran for five years. He then returned to higher education and studied for a Masters in Business Intelligence the University of Nancy, graduating in 2009. Since then, Thierry has been employed at the University of Lorraine in Nancy working for the Central Direction responsible for University partnering with external bodies. In that role, Thierry has acquired considerable experience of semantic media wiki technology being used by different scientific partners. He is currently considering embarking on a PhD programme in digital humanities, which would start in September 2016.

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Harriet Richardson, Survey of London

[ADD DATA]

Q1

Harriet's undergraduate studies in English were at the University of Nottingham. She then studied for an MLitt in architectural history at the University of Saint Andrews. Following graduate studies Harriet worked on a study of Scottish hispitals, and later moved to London, where she joined the Survey of London.

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Dr Jenni Hyde

Q1

Dr Jenni Hyde was awarded her PhD from Manchester in 2015. She is a part-time lecturer at Liverpool Hope University and an honorary member of the History Department at Lancaster University.
She specialises in Early Modern ballads, usually C16th, with a second research interest in protestant martyrs under Queen Mary
Jenni's research treats ballads as an historical and musical resource, drawing on her earlier professional career as a music teacher
She is based near Preston in Lancashire.

Q2

Jenni named three top search engines, which she uses in her research.
(1) Google (2) Early English Books Online (EEBO) (3) State Papers Online
When working on a new research topic, Jenni will start with Google. If she gets too many results for a given search in Google, she will use the same search times but filtered through Google Books and/or Google Scholar and/or Google Images.
On further prompting, Jenni mentioned using (4) British History Online (5) The National Archives Discovery - Advanced Search.
The content of Old Bailey Online is too late for Jenni's work.

Q3

Jenni works predominantly from home, where she uses a laptop, without a second screen. She also has a tablet, which she uses in archives and libraries to take notes
When working on her laptop, Jenni will have up to ten windows open at any one time. She will close windows when the screen gets too full.

Q4

We took Google as an example to explore how Jenni works with a search engine.
A typical search would be to look in Google for a line of a ballad. A line of a ballad could contain up to ten words, all of which Jenni will enter into the search box, constrained by inverted commas. She will remove the inverted comma constraint if nothing turns up on the search. She will also try alternative spellings - sometimes Google will come up with alternative spelling through its algroithms, but it is nevertheless worth adjusting the search terms for alternative spellings as well.
Jenni works with the default Google setting of ten results per page, and will work through up to five or six screens of ten results per page, before trying something new. Her rule of thumb for initiating a new search is when the results on a given results page of ten results start to look very obviously not relevant.
Easy accessibility is important in determining which search engines Jenni uses - she mentioned that Google Scholar was less visible as a Google service than Google, Google Books and Google Images, which she found irritating

State Papers Online have quite good fuzzy search functionality, but it is hard to browse results around a given result.
For example, if the term "Thomas Cromwell" produced an interesting result in SPOnline, Jenni will probably want to look at entries for one or two weeks before and after that entry, but it is not easy to get to those results and to browse them.

Q5

We worked through Early English Books Online as a second example of historical search.
Jenni makes great use of this resource, since it contains key content. However, she finds the resouce slightly annoying to use, and criticised some of its functionality
She stated that "all the content is there, but if don't put in exactly the right terms you don't get easily to the content".
An example of this need for exactness is search by bibliographical reference. The inputted reference needs to follow the exact format recognised by the EEBO search engine, including spacing between letters and numbers
Jenni also criticised the EEBO "sounds-like" function, stating that key word search claims to have a sounds-like function, but that for her purposes it was "not fuzzy enough", and that the degree of fuzziness could not be controlled.

Jenni would like to see two specific improvements to EEBO search:
(1) More "flexibility on results"
(2) Semantic date search, with the EEBO search engine recognising dates contained in the full text of documents and being able to produce results which reference those date, rather than simply the recognised date of the document. In the case of C16th documents in EEBO, many documents are simply dated 1500-1599. As a result it is hard to date constrain results, and Jenni gets too many results.

Q6

Jenni's preferred storage method for her research results are large Word documents. In the past she has tried using EverNote, which takes text off a page and stores it in a notebook, but has not found this tool very useful and has discontinued using it.

When starting work on a new research theme (e.g. writing an article on the Pilgrimage of Grace) she will create a new Word document named "Pilgrimage of Grace". She will then paste research results and document extracts into this Word document. These documents can become as large as 1000 Word pages. She will then search within the Word document using standard Word search tools when synthesising the material. She tends to keep the Word document going until she has finished her work on that theme, rather than creating multiple Word documents. At the end of a specific piece of work, she will archive these Word documents, sometimes combining several Word documents into larger Word documents for archiving purposes.

Jenni uses Excel and Access to manipulate data from these Word research documents, but will not tend to enter data directly from his searches into Excel and/or Access, preferring to go through the Word document intermediate stage.

Q7

See answer to Q4 regarding use of Google , and then filtering results via Google Book, Google Scholar and Google Images

Q8

Jenni has not been an active searcher on the MarineLives wiki for her own research, so we did not pursue this question.

Dr Andy Burn, University of Durham

This interview was focussed on interacting with the MarineLives wiki to generate improvement ideas, and so did not address all eight questions in the interview guide

Q1

Dr Andy Burn is a post-doctoral fellow at the University of Durham. His current research concerns "Social Relations and Everyday Life in England, 1500-1640", a Leverhulme-funded project led by Professor Andy Wood.^[1] The first year of this project involved extensive research across England in local record offices and archives in which Andy examined mainly legal documents generated by Church and National courts. Andy's research is now moving more online and will mine State Papers (using State Papers Online) as well as Early English Books Online (EEBO), plus local records accessed electronically.

Q2

Andy named unprompted the following five search engines as the main search engines he currently uses.

(1) Google
(2) Google Scholar
(3) National Archives Discovery - Advanced Search
(4) EEBO
(5) State Papers Online

Andy uses Google if he is looking for a specific book. Google is ideal for C19th books, since their full text is available via Google. He uses Google Scholar if he is looking for an article, but commented that accessing Google Scholar is not easy, since the URL is fairly hidden by Google. Andy commented that Google is "quite forgiving", so that a typo in the title of a book or a rough version of the title of a book, will still generate useful results. In contrast to many Google users, who typically use between one and three words in a query, Andy tends to use a significant number of words in his Google queries (ranging between five and ten). He stated that this was due to the specific searches he tends to use Google for - searching for the titles of books and articles.

When looking for archival material, Andy makes use of The National Archives (TNA) Discovery Advanced Search. Typically he will date constrain his TNA search. Often he will also constrain his search to a specific record series, e.g. Exchequer, or Chancery records. He does this by typing in the broad class mark of the record series (e.g. E 134) and then adding the search term or terms he is looking for. Andy tends to be quite specific in the archival series he constrains his searches by, based on good knowledge of the record series and which series are likely to be of use to his research. He does not make use of broader "facetted" type constraints, e.g. "legal", which can be applied using TNA Discovery.

Andy commented that he is using TNA Discovery very differently than he uses Google - in the case of Google it is to get to a specific document, e.g. a book or article, which ideally he will then be able to read on line. In the case of TNA Discovery, he is typically perparing a visit to the physical archive. He gave the example of lining up three hundred physical archival documents to view on a three day trip to TNA in Kew. When using TNA Discovery, Andy will typically constrain the search results just to documents held by TNA (excluding results from other archives).

Q8

Andy has performed a series of searchs on the MarineLives wiki when it was in its old format, hosted by wikispot, with the content organised into separate wikis by volume of depositions. He has not searched on MarineLives wiki since its consolidation and expansion in May 2015.

We performed two experimental searches on the MarineLives wiki during the interview, which highlighted a number of issues and improvement ideas.

Experiment: two search tasks using the MarineLives wiki

MarineLives wiki: "price of coal"

Google: "MarineLives" + "price of coal"

We explored together how Andy would approach answereing two specific research questions using the MarineLives wiki:

(a) Task One: Find five ships containing coal from Newcastle
(b) Task two: Find the price of coal

In the case of task one, Andy performed his first search using the words Newcastle and Coal. This search yielded fourteen results, with Andy chosing to click on the link in the results for the wiki page HCA 13/72 f.52v Annotate. Once on this page, Andy used CTRL+F on his keyboard to open a search box for the page and typed in Newcastle. This yielded three uses of Newcastle on the page, but interestibgly all three uses were in the People note section of the page, rather than the transcription of the image on the page. This led Andy to suggest an improvement to MarineLives wiki search functionality whereby a researcher could specify the portion of any page he or she wished to search, e.g. Transcription; Annotations; Metadata

In the case of task two, Andy typed in price of coal without inverted commas. However, this did not yield useful results. At the interviewers suggestion he typed in "price of coal" constrained by inverted commas to force search for the exact phrase. This led to three results, all of which contained the phrase "price of coal". However, a curious feature of the current MarineLives wiki search function is that the snippets displayed in the results page do not highlight the constrained. This can be compared with the superior Google snippets which are produced when a constrained search is performed using "MarineLives" and "price of coal".

MarineLives wiki search improvement ideas

(1) Andy suggested it would be useful to specify sections of the standard wiki pages to search on, e.g. Transcription text only; Annotations only; Metadata only

(2) He is a frequent user of fuzzy search on other search engines and finds this lacking in the MarineLives wiki. He gave the example of searches involving the term Newcastle, in which he would want his search to include name variants such as New Castle and New Castell. This can be done either using fuzzy search functionality or by the use of wild cards such as * or ?. He noted that there are different types of fuzzy search functionality, and that some are better than others. He was not keen on fuzzy search on State Papers Online, and observed more generally that the site was very slow in responding to searches, and that he often found it hard to find a document which he knew existed and had to be in the State Papers Online data repository.

Asked about his use of advanced operators such as AND and OR in queries, Andy stated that he rarely used them, though he did use inverted commas to constrain a phrase.

Asked about his use of natural language questions instead of small numbers of key words in queries, Andy said that he would use natural language questions in Google for a very specific type of enquiry - when he wanted to know the modern place name equivalent of a C16th or C17th place name. He had discovered from his use of Google that a query such as What is the place name [SPECIFY PLACE NAME] called now? would often yield an answer, typically in the form of a geanealogical website page with the old and modern name equivalents

Dr Cathryn Pearce

Q1

Dr Cathryn Pearce is an American, who has studies and worked in Alaska, Canada and England. She received her BA in History from the University of Alaska Fairbanks, and her MA in British and Maritime history from the University of Victoria in British Columbia. She received her doctorate in maritime studies from the University of Greenwich. She edits the peer reviewed online journal Troze for the National Maritime Museum, Cornwall.

Cathryn's current research project is on life saving and cosatal communities, and docuses on the private physical manuscript archive of the Shipwrecked Mariners Society, She has imaged the minute books of the Society, together with associated materials. The archive is located in Chichester and is a purely a paper archive with no electronic finding aids or search engine.

Q2

Cathryn uses a number of search engines to expand her knowledge of primary materials from the Shipwrecked Mariners Society. Her particular focus is on identifying individuals named in the archive of the Society and to establish links between the individuals.

Cathryn listed the following search engines as one she uses relatively frequently:
(1) Google (including Google Book and Google Scholard)
(2) British History Online
(3) University accessed search engines, e.g. House of Commons parliamentary papers, EEBO, JSTOR
(4) British Newspaper archive (private subscription)
(5) London Lives/Old Bailey

The actual archive and minutes and then try to fill out - people
Google - leads to Google Booke - use both Google and Google Books and Google Scholar, sometimes go direct to specific books. Use Chrome at moment, had been on Firefox.

How use Google? - tend to use linked key words in quotation marks - e.g. paper on royal navy and beginneings of shipwrecked mariners society, wanted to identify people and links
admiral sir george cockburn - put in separately initially
Generally searches did come to specifc sources in Google Books
Tolerance for results - prepared to look at up to ten pages
Would usually be three or four pages
RTefine if noty producing right kind of results - e.g. wrong period
Can't constrain by period e.g. C19th
Would also type in title of a book

Other search engines

House of Commons Parliamentary Papers - hate searching using this tool
Looking for a specifc act or bill - hard to get to the data - pulls up too many wrong sources. Requires patience.

EEBO - haven't used for a while. Keyword search works fairly well.

Use * if want to find ship wreck so get wreckers or wrecking, avoids lots of searcges

How sewuence search engines. Organised. Have a general idea of what going to be doing and looking for - know what section of paper and what questions

How record data. Use word documents. Use EverNote so can save directly from web. Use Zotero (have used for a long time, though not all features - way to keep documents organised

Word = note taking plus links and data (including the keywords which generated and URLs and click)

EverNote is the collection of everything that see - word is more specific

As write can take material from each and put into Scrivener - software that allows you to pull in all research with a separate screen for writing - so can have notes and text at same time. Always looking for SW (used to be just on MAC, but eventually a windows version - ca 4 years ago). Also allows you when finishing up to move paragraphs around and can export =into word or Rich Text format - great for
Cork board section - subtitles can move around
Zotero is not integrated into Scrivener - not sure - cutting and pasting into scivenere.
Still sometimes use index cards

Q3

Work mainly from home. Have a desktop and a laptop. Have both open at same time. Dream of a second monitor.

Q4

Q5

Q6

Q7

Q8

Specific improvments

Haven't used recently MarineLives wiki recently - will get back to us with some specific improvement ideas

Dr James Brown, University of Sheffield

[ADD DATA]

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Standard search box on the MarineLives wiki

The standard search box in the top Right Hand corner of each MarineLives wiki pages searches as follows:^[2]

- It searches all pages on the wiki with some restrictions.
- The article content is searched in its raw (wikitext) form - i.e., it searches the text that appears in the edit box when you click "edit", not the rendered page. This means that content coming from an included template will not be picked up, but the target of piped links will be.
- The search is not case-sensitive, so 'MediaWiki', 'mediawiki' and 'MEDIAWIKI' all give the same result.
- The search functionality can be considered to operate on whole words, separated by spaces or other punctuation marks. So if your search term includes the word 'book', the results will not include pages that only have the word 'books' or 'booklet'. And if your search term includes the term 'inter', the results will not include pages that only have the word 'international', but they may include pages that have the term 'inter-national'.
- The results will only include pages that contain all the words in your search.
- You can search for a phrase using double quotes. A phrase can be considered to consist of whole words (case-insensitive), so the phrase 'Prime Minister' will not be found by a search for "ime Min", but it will be found by a search for "pRIME mINISTER".

Search Engine Examples with Semantic Aspects

Semantic search features offered by Google

Facet-based search in biomedical domain: Example: Semedico

Cluster-search: Example: Carrot2

Natural Language Processing facilitated search: Example: EasyAsk for commercial websites

Semantic search on MarineLives Semantic Media Wiki

The MarineLives wiki is a Semantic Media Wiki. For technological background see Rowan Beentje, 'Tech Talk', June 27th 2016

The semantic features of the wiki offer the ability to specify semantic searches.

Semantic search query form using MarineLives Special Ask API

Search screens

National Archives - Advanced Search home page

BritishLibrary - Online catalogue access page

British History Online - Home page

Connected Histories - Home page

Useful links

National Archives advanced search
British Library catalogue search
British History online search
Connected Histories search
Semantic MediaWiki

Semantic Media Wiki Examples

MarineLives Wiki

- 5 categories
-- Indexes (31 members)
-- Languages (6 members)
-- Pages (10,025 members)
-- Pages with broken file links (2 members)
-- Volumes (81 members)
- Why do pages not appear under the index? e.g. Dutch tagged pages; Spanish tagged pages?
- http://www.marinelives.org/wiki/Dutch
- http://www.marinelives.org/wiki/Spanish - displays in groups of 200, without any ability to determine the number of results per page

Semantic mediawiki capabilities

Semantic extension: Google Maps format
Semantic extension: OpenLayers format
Semantic extension: KML format

Bibliography

Aula, Anne, Rehan M. Khan, Zhiwei Guan, 'How does Search Behaviour Change as Search Becomes More Difficult, CHI, Atlanta, Georgia, April 10-15, 2010, viewed 05/07/2016
- Presents data from investigation of search failures in small-scale lab studies as well as search engine-logs, looking at signals of user frustration
- Citations of interest include:
-- Aula, A, Majaranta, P., and Räihä, K.-J. (2005) Rye tracking reveals the personal styles for search result evaluation. Proveedings of Human-Computer Interaction - INTERACT 2005, 1058-1961.
-- Brand-Gruwel, S., Wopereis, I. and Vermetten, Y. (2005) Information problem solving by experts and novices: analysis of a complex cognitive skill. Computers in Human Behaviour, 21, 487-508.
-- Jansen, B. and Spink, A. (2006) How are we searching the world wide web? A comparison of nine search engine transaction logs. Information Processing and Management, 42, 248-263.

Grimes, Seth (January 21, 2010). "Breakthrough Analysis: Two + Nine Types of Semantic Search". InformationWeek, viewed 02/07/2016

Haase, Peter, Daniel Herzig, Mark Musen, Thanh Tran, 'Semantic Wiki Search', in L. Aroyo et al. (eds.), ESWC 2009, LNCS 5554, pp.445-460, 2009
- Proposes a search interface to combine the expressiveness and capabilities of structured queries with the simplicity known from keyword interfaces and faceteted browsing, which are easier to handle for lay end users (p.446)
- Breaks search process into (1) Articulation of the information need (2) Query interpretation using keyword translation (3) Result presentation and refinement
- Proposes a workflow in which (1) Information needs are articulated as key words (2) user queries are translated into structured conjunctive queries (3) conjunctive queries are presented to the user and can be refined by the user, following the paradigm of faceted browsing, adding or removing facets to broaden or narrow the query.
- MediaWiki supports the hierarchical orgamnisation of categories, and Semantic Media Wiki (SMW) can be configured to interpret this as an OWL class hierarchy
- SMW also has a special property of "subproperty of" that can be used for property hierarchies
- Conjunctive queries fall into three categories (1) Entity Queries (2) Fact Queries (3) General Conjunctive Queries
- Entity Queries correspon to a wiki page
- Fact Queries are queries for concrete properties of particular objects, and correspond to one (or more) statements on a page, but not to a page
- General Conjunctive Queries are queries which allow the retrieval of multiple examples of a general conjunctive enquiry, which may multiple statements on many pages

Ruiz-Montiel, Manuela, Joaquin J. Molina-Castro, Jose F. Aldana-Montes, 'TasTicWiki: A Semantic Wiki with Content Recommendation', Conference: 5th Workshop on Semantic Wikis- Linking Data and People; 7th Extended Semantic Web Conference. Hersonissos, Crete, Greece, June 2010. pp. 31-40

Solomou, Georgia, Dimitrios Koutsomitropoulos, 'Towards an evaluation of semantic searching in digital repositories: a DSpace case-study', EWlectronic Library and Information Systems, Vol. 49 No. 1, 2015, pp.63-90
- Looks at semantic search in DSpace as an example of a popular content-management system (other examples being EPrints, Digital Commons, CONTENTdm, ETB-db, and Fedora)

Veja, Cornelia, Christoph Schindler, Basil Ell, Semantic MediaWiki Based Virtual Research Environments: The Case of Semantic Collaborative Corpora Analysis, Barcelona, 28-30.10.2015

Terminology

Exploratory Search

Wikipedia article: Exploratory Search

Semantic Search

Wikipedia article: Semantic Search
- Navigational search vs. Research search
- Research search = providing search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. The user is attempting to locate a number of documents, which in their entirety will provide the desired information.

- Research search is similar to Exploratiory search

[1] University of Durham: Dr Andy Burn profile

[2] Mediawiki - Help:Searching

[1]

[2]

@@ Line 366: / Line 366: @@
 '''Q1'''
-An American who lived in Canada. BA in History University of Alaska Fairbanks. MA British and Maritime emphasis University of Victoria. PhD maritime studies Greenwich. Edting online journal Troze - National Maritime Museum, Cornwall, peer reviewed.
+Dr Cathryn Pearce is an American, who has studies and worked in Alaska, Canada and England. She received her BA in History from the University of Alaska Fairbanks, and her MA in British and Maritime history from the University of Victoria in British Columbia. She received her doctorate  in maritime studies from the University of Greenwich. She edits the peer reviewed online journal Troze for the National Maritime Museum, Cornwall.
-Current research - research project on life saving and cosatal communities. Focus on archive Shipwrecked Mariners Society - physical archive , have photographed minute books and associated materials. Archive in Chichester. No electronic finding aids.
+Cathryn's current research project is on life saving and cosatal communities, and docuses on the private physical manuscript archive of the Shipwrecked Mariners Society, She has imaged the minute books of the Society, together with associated materials. The archive is located in Chichester and is a purely a paper archive with no electronic finding aids or search engine.
 '''Q2'''
-Search engines
+Cathryn uses a number of search engines to expand her knowledge of primary materials from the Shipwrecked Mariners Society. Her particular focus is on identifying individuals named in the archive of the Society and to establish links between the individuals.
-(1) The actual archive and minutes and then try to fill out - people
-) Google - leads to Google Booke - use both Google and Google Books and Google Scholar, sometimes go direct to specific books. Use Chrome at moment, had been on Firefox.
-(2) BHOL
-(3) University search engines - House of Commons Parliamt papers, EEBO, JSTOR
-(4) British Newspaper archive - paid for subscription - pay for personally
-(5) London Lives/Old Bailey - love those sites
-Google
+Cathryn listed the following search engines as one she uses relatively frequently:
+(1) Google (including Google Book and Google Scholard)
+(2) British History Online
+(3) University accessed search engines, e.g. House of Commons parliamentary papers, EEBO, JSTOR
+(4) British Newspaper archive (private subscription)
+(5) London Lives/Old Bailey
-How use - tend to use linked key words in quotation marks - e.g. paper on royal navy and beginneings of shipwrecked mariners society, wanted to identify people and links
+* The actual archive and minutes and then try to fill out - people
+* Google - leads to Google Booke - use both Google and Google Books and Google Scholar, sometimes go direct to specific books. Use Chrome at moment, had been on Firefox.
+How use Google? - tend to use linked key words in quotation marks - e.g. paper on royal navy and beginneings of shipwrecked mariners society, wanted to identify people and links
 '''admiral sir george cockburn''' - put in separately initially
 Generally searches did come to specifc sources in Google Books
@@ Line 391: / Line 393: @@
 Would also type in title of a book
-Other engines
+Other search engines
-House of Commons Parlt Papers - hate search - doesn't
+House of Commons Parliamentary Papers - hate searching using this tool
 Looking for a specifc act or bill - hard to get to the data - pulls up too many wrong sources. Requires patience.
@@ Line 429: / Line 431: @@
 Specific improvments
-* Haven't used recently
+* Haven't used recently MarineLives wiki recently - will get back to us with some specific improvement ideas
 ----

Difference between revisions of "Tools: Team Two"

Revision as of 13:39, July 6, 2016

Contents

Team summary

Historians - we need your input

Historians use of historical search engines - planned interviews

Historical search interview guide

Interview guide with Thierry Daunois

Interview notes on historical search strategies and use of search engines

Louise Falcini, PhD candidate, University of Reading

John Levin, PhD candidate, University of Sussex

Thierry Daunois, University of Lorraine

Harriet Richardson, Survey of London

Dr Jenni Hyde

Dr Andy Burn, University of Durham

Dr Cathryn Pearce

Dr James Brown, University of Sheffield

Standard search box on the MarineLives wiki

Search Engine Examples with Semantic Aspects

Semantic search on MarineLives Semantic Media Wiki

Search screens

Useful links

Semantic Media Wiki Examples

MarineLives Wiki

Other historical content

Semantic mediawiki capabilities

Bibliography

Terminology

Navigation menu

Search