Tools: Tech Talk
Rowan Beentje is the designer of the MarineLives semantic media wiki and volunteer technical advisor to Digital Pop Up Lab. In his day job he works in mobile and web application development for a major media company.
The article below, written by Rowan Beentke, describes the technology behind the MarineLives wiki.
Structure of the wiki
The MarineLives wiki is built on a PHP-based stack:
- Media Wiki
- Semantic MediaWiki extension to allow storage and querying of data across pages
- Semantic Forms extension to allow editing of pages as structured data
- Custom extensions for folio navigation, basic transcription, and improved behaviour to match transcription expectations.
History of the wiki
The wiki was migrated from a collection of separate wikis; historically the wiki was set up with relatively unstructured data from one wiki site per volume of depositions (witness statements) and several further wikis for cross volume analysis. An importer took the data from each wiki and converted to structured data on a single wiki, with the analysis wikis moved to namespaced pages.
Technology approaches available to three teams
The three teams within the Digital Pop Up Lab will need different technology approaches:
Team one: semi-automated recognition of handwritten manuscripts
Rather than starting from scratch with a system like Tesseract which has problems with even handwriting, the recommended approach is to integrate with the Transkribus suite. The base software is written in Java, and currently has a number of client approaches in Java together with a JS+ LAMP platform known as a Transcriptorium which uses the Transkribus web services.
Team two: tailored and semantic search
While it would be possible to build on top of exported data and use solutions such as graphql on top of that, the recommended initial approach is to explore search interfaces and data exploration built on top of the semantic mediawiki data interface. This would probably use a mix of PHP for any custom extensions and editing wiki pages to use the advanced Semantic MediaWiki syntax. See: https://www.semantic-mediawiki.org/wiki/Ask_API , https://www.semantic-mediawiki.org/wiki/Help:Selecting_pages , and https://www.semantic-mediawiki.org/wiki/Special:Ask)
Team three: visualisation of historical data
This team has the most flexibility in the tools to use in exploring the data. The data being visualised may come from custom data sets, or semantic/annotated data read live from the wiki using normal MediaWiki APIs, or more likely the Ask API from Semantic MediaWiki. Custom extensions presenting data transformed from internal APIs could also be used as a data source, but transformation and presentation after that could take many forms.
For an example of Ask API generated data output for further digital processing see image.