Handling files containing Unicode characters in Harvest

Document ID : KB000027249
Last Modified Date : 14/02/2018
Show Technical Document Details

Problem: 

Checking in a unicode file & received error

E03020028: The File Agent error for item C:\Temp\Demo\CaseManagement\CaseManagement.sql: Text file contains invalid characters I00060079: Check in summary: Total: 1 ; Success: 0 ; Failed: 1

Environment:  

CA Harvest SCM all versions and platforms

Cause: 

Harvest does support Unicode files as long as they're defined as Binary type in the repository. If they're defined as Text, you'd get an error message during checkin "Text file contains invalid characters"

As long as a File Extension is defined as Text in the repository, Harvest performs the end of line and end of file markers conversions on such files. Harvest will treat all versions of that specific extension from the point when it was defined in the repository as Text. Because different repositories might contain different kinds of code, different conversion rules can be specified for each repository. In a repository, conversions are performed based on file type (extension).

Resolution:

The situation can be tackled in a few ways

     I.        Checkin the file as binary type

You could Checkin the file as Binary. Keep in mind presence of prior versions of the same file or other files with same extensions could cause conflicts. Also, you won't be able to perform Merges on such files properly, as the file is being treated as Binary. However, external merge tools can be used in such cases to resolve merge conflicts.

HChgType command can be used to modify the treatment of such file extensions in Harvest Repositories. Refer to HChgType help in the Harvest Command Line Reference Guide.

435885a.gif

Figure 1

   II.        Save file as ANSI text

If it is possible to save the file as ANSI text, then save it as ANSI Text and then checkin.

Below process shows one way to save such a file in ANSI mode.

a.    Open the file in Notepad

b.    Click on File -> Save As

c.    Choose ANSI in Encoding drop down

d.    Save file

e.    Checkin the file

435885b.gif

Figure 2

Once the file is saved in ANSI or UTF-8 Encoding format, the file will checkin fine without any errors.

 III.        The file extension was defined as Text in in Harvest

The file extension is a true binary file (Unicode or regular binary file, doesn't matter). For example, DOC was defined as Text in the Harvest repository and you're trying to checkin a true Microsoft Word file (which usually is a Binary file). Harvest tries to treat this file as Text, because of the repository definition, but throws out the the invalid characters error message.

To correct such an issue, you'd have to use remove the extension from the repository extensions list and then perform the checkin.

 IV.        Checking in XML files

In some instances when you attempt to check XML files into a Harvest repository, the check in process for some XML files will fail with the same "Text file contains invalid characters" error.

This error occurs when XML files have been saved using the Unicode encoding format. Harvest normally treats XML files as text files, however when a file is encoded as Unicode, the file is not recognized as a text file because Harvest sees it as a binary file.

435885c.gif

Figure 3

If you re-save the source XML file in a non-Unicode format, such as ANSI text, harvest will then check the file in without any errors.

NOTE: When Unicode files are treated as Binary in Harvest, at times there can be a problem to perform merges.

In Harvest Workbench, you can use Harvest External Merge tool option and use that option to invoke a custom/preferred merge tool and resolve merges.

Additional Information:

For additional information on File extensions, please refer to Harvest Administrator Guide.

For more information on 'hchgtype' , please refer to Harvest Command Line Reference Guide.