LANGUAGE-INDEPENDENT CONTENT

Keywords: language-independent content (fields, parts), cross-language fields, language-variant-scoped fields

This document is about the problem of how to take care of document content (mostly fields, but could be parts) that need to be the same for all language variants.

Writing down the problem and the possible solutions will hopefully help in finding the best solution to this problem.

Why / motivation

Avoid the work of manually syncing the values between multiple language variants.

Avoid interference with translator's work and translation management (synced-with link): since there is no translation involved, syncing that content should not be a job of translators. The language-independent content should not interfer with TM import/export either.

The different scopes at which content can be stored

Data can be addressed by a certain (possibly composite) key. The most common in Daisy is the document variant, which is addressed by {document ID, branch, language}. In fact, this always includes an amount of versioned content, for which either the last version (for update) or the live version (for publishing) is used by default. The versioned content is addressed by {document ID, branch, language, version}.

The following table shows some possible 'key' combinations, and what sort of data they address.

document

branch

language

version

comment

Currently existing in Daisy

X

global document properties: ID, private, owner

X

X

X

non-versioned variant properties: collections, document type, ...

X

X

X

X

versioned variant properties: document name, fields, parts, links

Not yet existing in Daisy

X

X

cross-branch properties

X

X

X

versioned cross-branch properties

...

What other CMS's do

Very little found.

LinguaPlone: http://plone.org/products/linguaplone

for language-independent fields, they seem to use the (automatic) 'duplicate them' approach.

What content should be shareable acorss language variants

  • fields

  • parts

  • collection membership

  • document type

For fields and parts, we can extend the schema model to indicate they should be language-independent.

For collection membership and document type, we should take a decission for Daisy as a whole, rather than making it configurable.

Implementation approaches

The three main approaches I can think of are:

  • (possible today) move language-independent fields into separate document, link to that document

  • have a document model that supports language-independent fields

  • keep the document model as is, but let the repository force that values of language independent fields are the same for all language variants. Thus when one language variant is updated, the other variants are automatically updated as well.

The next sections go into more detail.

Move language-independent content into a separate document

This can be seen as a cheap version of the native language-indepent fields support.

The pro's are mostly the same as for the real language-independent fields.

Looking at things this way might also help to think about how to implement real language-independent fields: for each document, we could have an additional 'virtual' document (one that is not externally addressable) that holds the language-independent content.

Support language-independent fields in the document model

Pro:

  • efficient storage

  • no problem / interference with synced-with

  • if there is a large amount of language-independent content, this approach avoids much duplication

  • allows to put new versions of the language-independent content live separately from the language-dependent content (con: it is the same which is live for all)

Con:

  • the versioning problem: this cross-language content should also be versioned.

    • We now have two version histories: the one of the language variant, and the one of the language-independent content. Which version of the the language-independent content should be combined with which version of the language content? Maybe take the same behavior as with normal linking? (which is what exactly? always live)

  • Probably organization-specific, but approval and 'put live' workflows might now need to cover both the language in- and not-independent content.

Other:

  • cross-language fields should behave as if they are fields of the individual variants (with respect to publishing, searching, ...)

  • we can think of this approach as being similar to linking to a separate document, but more transparently integrated.

  • how would editing work? e.g. when is validation for these fields checked, what to do with the 'put live' flag, ...

  • repository API: is editing of cross-language content done via the varaints, or as a separate entity / document?

Duplicated, auto-synced fields between language variants

Idea: implement language-independent fields by letting the repository force that field values are the same for all variants: thus when a language independent field is updated, this will result in new versions for all the language variants with the new field value.

Pro:

  • little modifications to document model (stays simple, no incompatible changes)

  • easy switching between language-independent or not

  • easily decide for each individual language when to put these changes live

Con / difficulties:

  • updates to language-independent fields can only be done if there are no locks by other users on the variants

    • side-problem: even if locked by current user, it might be in a different editing session. Maybe we should support session-based locking too?

  • what to do with the version state and synced-with pointer of the variants which are updated as a result of updating the language-independent fields?

    • version state: keep version state of previous version (unless requested otherwise)

    • synced-with: leaving to the previous value seems the only reasonable thing, but than this would cause the synced-with pointer to be non-synced. Under some circumstances, we could auto-update the synced-with (was already synced with previous last major changes + edit only contains changes to cross-language fields)

  • will cause update events (JMS, email notifications) for all variants (JMS event is necessary since there is really an update happening, email notifications might be possible to suppress)

Thoughts:

  • this only enforces that the last version of the fields is the same for all documents

Other

  • separate access control for modifying language-independent content?

Conclusions

While the approach with duplicated storage might be simpler to implement initially, it leaves open some big questions on what to do with the synced-with pointer. Because of this alone, it seems real language-independent are the best after all.