Home | Resources | Services | Hosting | Publications | Collaboration | Joining CERL | About CERL |

How to merge records which have not been marked for deduplication

Each time a new file is added to the CERL Thesaurus, there is an algorithm running over the data and checking if there are records in the new data that would produce a duplicate entry for an entity already existing in the database. However, since there is sometimes very little information associated with either the new or the existing record, this alogrithm might miss a hit here and there. If you come across a duplicate entry in the search that has not yet been marked for deduplication, you can add those marks yourself and merge the records manually. For example:

Identical records in search result

1. Open all duplicate records in separate tabs

You will need to copy&past the record IDs, so it will come in handy to have them all open in separate browser tabs.

2. Pick one master record

Pick the record you like best or the one with the most extensive information or the oldest as a master record into which the other records are going to be merged. Since any CERL Thesaurus identifier ever assigned to an entity will always remain valid, it kind of doesn't matter which one you choose - just choose one.

Open this recod in the CT Internal Format editor

master record

3. Create a 831 Field for each duplicate

Enter a new field 831 at the bottom of the record for each duplicate record (except for the one that you have declared the master record and opened in the editor, of course). The first indicator of the new field 831 should be blank and the second indicator must hold the value 1. The field must contain a subfield $a with the CERL Thesaurus id of a duplicate. Copy&Paste those IDs from the records you have opened in other tabs.

 new 831 fields

4. Merge the records as usual

Save the reord and re-open it with the record merger editor.

 Merging records

The record merging editor should show the records you have just entered in the 831 fields (if it shows more than those, please check if the additional records are really duplicates). Just hit Save all and you're done.

This website uses cookies to ensure you get the best experience from it.
 resources/cerl_thesaurus/editing/mergingunmarked.txt · Last modified: 2020/01/23 09:05 by jahnke



Recent changes RSS feed Valid XHTML 1.0 Driven by DokuWiki