i18n and user generated content

by ekes

One of the issues with user generated content multi-lingual sites is that content will often be in only one language, or maybe translated into a few others. You can be certain that not all content that is even translated will be available in all languages.

The way that the Drupal i18n module, for both Drupal 5 and 6 by default handles translated content is to offer you:-

  1. Only current language and no language
  2. Only current and default languages and no language
  3. Only default language and no language
  4. Only current language
  5. All content. No language conditions apply

Anything but number 2 is going to hide a lot of content from your users. Depending on the nature of the site you may well want them to see that there is activity and content from others even if they can't read it, and in a multilingual environment it's not so uncommon for users to comprehend other languages either. Hiding it from them as if it doesn't exist isn't really an option.

The localizer module solves this by letting the user weight their preferences of languages, and making a multiple join weighted query. Other factors about the modules design mean I've not had the chance to try this out in a large site environment yet, but I hope to have a chance to analyse it's performance, maybe when it's ported to Drupal 6.

So back with i18n the problem with option 2 is that it will still hide content from users if it's not translated in their preferred language. In fact despite what you expect it won't even display the content if it is in the default language and has been translated into another language if neither of them are your choice.

The issue comes up with the solution of a 'mixed_single' 'Current, default language when current not present and no language', which does show those translated default language posts. So the only thing you are missing now is content that is in languages other than the users and the default.

You end up with queries like this however (node access is on as well):-

SELECT DISTINCT(node.nid), node.sticky AS node_sticky, node.created AS node_created_created FROM node node LEFT JOIN i18n_node def ON node.nid = def.nid LEFT JOIN i18n_node ndf ON ndf.trid = def.trid AND def.trid != 0 AND ndf.language = 'es' INNER JOIN node_access na ON na.nid = node.nid WHERE (((def.language = ndf.language) != 0 or (def.language is null and ndf.language is null) or (def.language = 'en' and ndf.language is null) or (def.language = 'es' and ndf.language is null) )) AND (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 0 AND na.realm = 'og_public'))) AND ( (node.status = '1') ) ORDER BY node_sticky DESC, node_created_created DESC LIMIT 0, 10;

As i18n_node is just indexed on nid this can be really slow. Adding an index including language, trid and nid halved query times, but they can still be really very long, like about 5 minutes on a 2 Xeon CPU with a gig of memory a standard join_buffer_size and around 40,000 nodes. Plenty of default pages will bring up queries like this, so overriding and avoiding anything that won't fit in memory is a tough option. I've stared at it for some time but I can't make the present tables work with a query that would do it better than the collective intelligence on the thread has already.

All this looks sorted In Drupal 6 with the it's in build i18n the new version of the i18n module and the active translation module. Here you get the original if it's not translated into your preferred language. There are a couple of translation tables and (without the node access) were down to one INNER JOIN :-

SELECT n.nid, n.sticky, n.created FROM node n INNER JOIN active_translation ON n.nid = active_translation.fr WHERE 1 = 1 AND ( n.promote = 1 AND n.status = 1 )ORDER BY n.sticky DESC, n.created DESC LIMIT 0, 10;

The active translation table is keeping a row per node of original and any translations, it's going to get big, but it's simple too. It could be possible to use the idea in Drupal 5, (maybe?): also hook into _nodeapi add another table, then patch i18n (we're doing it already above) to use this table for the query instead. Not quite so elegant...

Roll on making sites in D6.

Comments

Thanks a lot, it works for me

Thanks a lot, it works for me after 3 o 4 attempts, im still on drupal5, and made a mixup a bit, but for the third attempts the things cleared up for me.

information

thank you for this information. its really useful and easy to follow

jen x

Update: "preferred" added to 5.x-3.x

A version of the double left join has been added to the i18n module i18n issue "Preferred language option" from an immediate view it still suffers the same complicated join something like:

SELECT count( DISTINCT(node.nid)) FROM node node LEFT JOIN i18n_node i18n ON node.nid = i18n.nid LEFT JOIN i18n_node i18n2 ON i18n.trid = i18n2.trid AND i18n2.language = 'en' WHERE (i18n.nid IS NULL OR i18n.language ='en' OR (i18n.language ='en' AND i18n2.nid IS NULL)) AND (node.status = '1')