Home | Resources | Services | Hosting | Publications | Collaboration | Joining CERL | About CERL |

no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


Last revision
resources:cerl_thesaurus:editing:mergingunmarked [2018/06/18 12:12] – created jahnke
Line 1: Line 1:
 +====== How to merge records which have not been marked for deduplication ======
  
 +Each time a new file is added to the CERL Thesaurus, there is an algorithm run over the data that checks if there are records in the new data that would produce a duplicate entry for an entity already existing in the database. However, since there is sometimes very little information associated with either the new or the existing record, this alogrithm might miss a hit here and there. If you come across a duplicate entry in the search that has not yet been marked for deduplication, you can add those marks yourself and merge the records manually. For example:
 +
 +{{ :resources:cerl_thesaurus:editing:dedupman1.png?700 |Identical records in search result}}
 +
 +=== 1. Open all duplicate records in separate tabs ===
 +You will need to copy&past the record IDs, so it will come in handy to have them all open in separate browser tabs.
 +
 +=== 2. Pick one master record ===
 +Pick the record you like best or the one with the most extensive information or the oldest as a master record into which the other records are going to be merged. Since any CERL Thesaurus identifier ever assigned to an entity will always remain valid, it kind of doesn't matter which one you choose - just choose one.
 +
 +Open this recod in the CT Internal Format editor
 +
 +{{ :resources:cerl_thesaurus:editing:dedupman2.png?700 |master record}}
 +
 +=== 3. Create a 831 Field for each duplicate ===
 +Enter a new field 831 at the bottom of the record for each duplicate record (except for the one that you have declared the master record and opened in the editor, of course). The first indicator of the new field 831 should be **blank** and the second indicator must hold the value **1**. The field must contain a **subfield $a** with the CERL Thesaurus id of a duplicate. Copy&Paste those IDs from the records you have opened in other tabs.
 +
 +{{ :resources:cerl_thesaurus:editing:dedupman3.png?700 | new 831 fields}}
 +
 +=== 4. Merge the records as usual ===
 +Save the reord and re-open it with the record merger editor.
 +
 +{{ :resources:cerl_thesaurus:editing:dedupman4.png?700 | Merging records}}
 +
 +The record merging editor should show the records you have just entered in the 831 fields (if it shows more than those, please check if the additional records are really duplicates). Just hit **Save all** and you're done.
 +
 +{{ :resources:cerl_thesaurus:editing:dedupman5.png?700 |}}
 resources/cerl_thesaurus/editing/mergingunmarked.txt · Last modified: 2020/01/23 09:05 by jahnke

 

 

Recent changes RSS feed Valid XHTML 1.0 Driven by DokuWiki