Tuesday, September 13, 2005

 

Processing Text

A new job (actually, a repeat of some work I did two years ago in AppleScript with InDesign 2, but with some new wrinkles) has come in. It requires me to write a suite of scripts to do serious processing to some text. The input is coming in the form of Excel spreadsheets, which I import into InDesign CS2 and the script takes it from there.

This will be a script that performs many functions sequentially. The ultimate goal would be to have the script take the input and convert it into a section of a catalog, but I'll probably fall short of achieving that. The first step is to use the entries in the first row of the table to determine which paragraph style should be applied to each column. The reason for doing this is that later we'll convert these tables back into text with each cell converted to a paragraph (or series of paragraphs if there's more than one in any cell). The paragraph styles will tell a later script (or later part of this script, depending on how things go) what to do with the paragraph in question.

To facilitate the script's growth as I add functionality, I'm going to use user-defined functions for each discrete part of the script. An advantage of this approach is that functions can implicitly make rules about the state of things when they're called, so they don't have to spend any time on assuring that the initial set-up is correct -- they just assume it and leave the calling routine (the main script or a higher function) to make sure things are just so. With that, here goes:
function applyParaStyles(theTable) {
 // Returns true if all OK,
 // else loads logs error to global variable myErr and returns false
}
While in a perfect world you might think that all variables should be local and if the function needs one it should be passed as an argument, for certain things its worth setting up some global variables. For the moment, I'm requiring that my main script set up three global variables: myDoc is a reference to the active document; myLib is a reference to the project library; and myErr is a string variable that is initialized to the empty string and which is used to log errors.

Project Library? Almost every job I do uses a library to some extent. The first thing we're going to use it for on this job is to hold the definitions of the paragraph and character styles we're going to need. Later, we'll need it to hold templates of the display elements we'll be using to populate our pages.

Why Log Errors? Eventually, this script will do a lot of processing. It will take a while to run and it will need to deal with out-of-spec data. Rather than just stop when that is detected, it will log an error and continue. That way, at the end of the run I'll have information about all that went wrong.

Thinks: It is distinctly possible that this script will crash InDesign. If that happens, I could lose my error log, so it might be better to log them to a text file. For now, I'll use a function to log errors so that when things start to become complex, I can change the strategy by simply replacing that function.
 var myLim = myTable.columns.length;
 for (var j = 0; myLim > j; j++) {
  var myStyle = getParaStyle(theTable.cells[j].contents);
 }
Here's the top level of the function. It sets up a loop to process each column. That means that the first thing we have to do is get the paragraph style that goes with each column. Notice that because this information is in the first row, I can simply look at the first myLim cells of the table because those are the cells of the first row. I'm taking advantage of the fact that these tables have no merged cells.

So, already, we need another function getParaStyle() to get the paragraph style that corresponds to the name in the each column head. For the moment, I'm going to write a temporary function that checks to see if there is a style with the name in question, and if not it will make one. Later, we'll be more discerning, insisting that only existing styles be used and logging an error if we don't recognize the name in the column head -- this is because my client has a number of people working on this job and not all of them use exactly the same terminology (at least, they didn't last time around).

Here's the first attempt at this function:
function getParaStyle(theName) {
 // Temporary function that either returns the paragraph style or makes one and returns it
 try {
  var theStyle = myDoc.paragraphStyles.item(theName);
  theStyle.name; // triggers error if theStyle is undefined
 } catch (e) {
  theStyle = myDoc.paragraphStyles.add({name:theName});
  try {
   theStyle.name;
  } catch (e) {
   errorExit("Couldn't make style with name: " + theName);
  }
 }
}
After all that stuff about error reporting, how come this function directly calls errorExit? Because this is a temporary function. If the paragraph style can't be made, there's no point in continuing.

Woo-hoo! It worked first time! Well, that's enough excitement for one evening.

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?