: InDesign GREP styles I have a few documents that have legal lines for 4 different countries (Japan, China, Macau, and Arabic) The legal line is composed for the most part of english and then
I have a few documents that have legal lines for 4 different countries (Japan, China, Macau, and Arabic) The legal line is composed for the most part of english and then the japanese characters, chinese characters etc. For example: chinese-characters TM and © My company. All rights reserved.
I have followed some tutorial for dual fonts using GREP styles. For each country, I’ve made 4 different character styles. Each with its own designated font. The text box on the document is linked to a paragraph style that contains the English font that I would like to use (Arial). I’m using unicode ranges for each language. For example, china CJK ranges is 4E00–9FD5
My GREP styles is as follows:
Apply Style: Macau
To Text: [x{4E00}-x{9FD5}x{3000}-x{303F}]+
Apply Style: China
To Text: [x{3000}-x{efff}x{4E00}-x{9FD5}x{3300}-x{33FF}][^.,;:?!d]+
Apply Style: Japan
To Text: [x{3040}-x{309F}x{30A0}-x{30FF}x{FF00}-x{FFEF}x{3000}-x{303F}x{4E00}-x{9FD5}]+
Apply Style: Arabic
To Text:[x{0600}-x{06FF}x{0750}-x{077F}][^.,;:?!d]+
The above GREP style works well for Japanese documents but it doesn’t work for the Chinese or Macau documents. If I change the order of the GREP styles in order for the Chinese or Macau files to work, then the japanese document stops working.
My dilemma is that I can’t have different documents for each country and load their own grep style since the documents share with each other contents.
I was wondering if there’s a specific GREP styles order that I should follow or am I missing something so fundamental for it work properly across all 4 languages.
More posts by @Gail6891361
1 Comments
Sorted by latest first Latest Oldest Best
The regular expression engine chooses the first match that's possible. For example the regular expression foo|foo bar will never match foo bar simply because it will always match foo first. See the engine does a match and continues its work form that point forward, never looking back.
A similar things happens with GREP styles. Except it happens in reverse, as individual styles are applied separately and the last one on the list gets precedence. So the last style gets to override the others. So simply china will override Japan if its below Japan in the list because Japan is mostly a subset of China the way you have phrased this.
Fixing your problems
Ok, so how to fix this? I am not sure it's possible as long as both use same kanji ranges! This is not the sort of thing that grep are any good at. Unless you can make the ranges not overlap entirely.
However, your GREP expressions are almost certainly wrong. lets look at individual parts first
[x{3000}-x{efff}x{4E00}-x{9FD5}x{3300}-x{33FF}]
look 3000 < 3300 < 4E00 and EFFF > 9FD5 > 33FF so this means that the ranges after the first are redundant, and its equivalent to writing:
[x{3000}-x{efff}]
That is if there is no bug in the matching engine. Second thing is you use two different matching logic:
pattern any number of things in range used by Macau and Japan
[...]+ uses this logic)
pattern any number of things followed by anything except these used by China and Arabic
[...][^.,;:?!d]+
Now the second of your patterns is really weird. It results in for example Arabic style used even if the later part of the sentence is in Japanese while Japanese is nowhere nearly as greedy to match. This is almost certainly not what you intended. In addition it will make debugging hard.
Attempting to fix this
Like I said I'm not sure its possible. You could try to use a different strategy instead of matching any number of things in a set match anything as long as it does not violate the set. For this you need to use lookaheads. Unfortunately lookaheads do not have infinite width so this may not work out for you very well. In essence regular expressions aren't really up for this job.
A alternate strategy opens if you have some character or position that you can match in the beginning and end of your text. like start of paragraph and end of paragraph then you can easily match a range that MUST start/end with that character and it will discard any such things that contain anything invalid.
So say you want to do this per paragraph lets for simplicity of testing that the paragraph can only contain lowercase ASCII letters and space and period and comma then ^[a-z .,]+$ would match:
the old man sighed but did not answer, and they moved on
in silence. the surf grew suddenly louder, as they emerged
from the forest upon a stretch of sand dunes bordering the sea.
but not
The old man sighed but did not answer, and they moved on
in silence. The surf grew suddenly louder, as they emerged
from the forest upon a stretch of sand dunes bordering the sea.
On the account that there is a invalid character in between.
PS
Either way you need to recognize that regular expressions are not up to all jobs where you want automated heuristics. This is probably one of them. Use something more sophisticated.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.