Home | Resources | Services | Hosting | Publications | Collaboration | Joining CERL | About CERL |

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
resources:cerl_thesaurus:editing:newinterface [2018/03/27 15:47] – [4.5 Deduplicating Records] jahnkeresources:cerl_thesaurus:editing:newinterface [2018/03/27 15:51] – [4.5 Deduplicating Records] jahnke
Line 247: Line 247:
  
 The editor to merge duplicate records is only shown among the options for choosing and editing client if the record holds an indication of possible duplicates (''#831 #1'' or ''meta.possibleMatch''). The approach the new interface takes to deduplication is slightly different from the one used within the old WinADH client: Instead of showing two records next to each other, the new client shows how the final record would look like, if all possible duplicates were merged together. This is supposed to make comparison between records easier, since all fields that are supposed to hold the same information are shown in the same spot, color coded by the source record it has been derived from. The editor to merge duplicate records is only shown among the options for choosing and editing client if the record holds an indication of possible duplicates (''#831 #1'' or ''meta.possibleMatch''). The approach the new interface takes to deduplication is slightly different from the one used within the old WinADH client: Instead of showing two records next to each other, the new client shows how the final record would look like, if all possible duplicates were merged together. This is supposed to make comparison between records easier, since all fields that are supposed to hold the same information are shown in the same spot, color coded by the source record it has been derived from.
 +
 +To find potential duplicates, search for ''dedup:'' and the record type, e.g. ''dedup:cnp'' for persons. You may combine it with something useful to get smaller resultsets, for example: ''external_id:gnd AND dedup:cnp'' to limit the set to records from the GND file.
  
 The duplicate candidate records are shown in the right column. The first (white) one is the record you started with; sinc this is the record the other are finally merged into, you cannot deselect it from deduplication. The duplicate candidate records are shown in the right column. The first (white) one is the record you started with; sinc this is the record the other are finally merged into, you cannot deselect it from deduplication.
Line 254: Line 256:
 Please note that by unchecking a record, you say that it is definitely not a duplicate to the first (white) one and by clicking the checkmark you say it definitely is. There is no "I’m not sure option" as there was in WinADH – if you are not sure then it is not a duplicate.((If you are not sure about that either, hit cancel and try another record)) Please note that by unchecking a record, you say that it is definitely not a duplicate to the first (white) one and by clicking the checkmark you say it definitely is. There is no "I’m not sure option" as there was in WinADH – if you are not sure then it is not a duplicate.((If you are not sure about that either, hit cancel and try another record))
  
-Please make also sure to check the biographical dates (340), activity notes (350, 300) and titles (291) to decide if records refer to the same entity or not. If that is not sufficient, try also to look at the source systems the records come from and see to what titles they are linked (this is not always possible, of course). Always bear in mind, that you are most likely the one who is saying the final word about these records being duplicates or not. Once you hit save, the records will no longer show up, when someone checks for duplicates, unless somebody stumbles across them by accident. If you say two records are not duplicates they will probably never again be considered for merging and if you say two records are, they will probably never be separated if they are not.+Please make also sure to check the biographical dates (340), activity notes (350, 300) and titles (291) to decide if records refer to the same entity or not. If that is not sufficient, try also to look at the source systems the records come from and see to what titles they are linked (this is not always possible, of course). Always bear in mind, that you are most likely the person who is saying the final word about these records being duplicates or not. Once you hit save, the records will no longer show up, when someone checks for duplicates. If you say two records are not duplicates they will probably never again be considered for merging and if you say two records are, they will probably never be separated if they are not.
  
 Once you've made your decisions and unchecked or checked the non‐duplicates and duplicates, you could change the order of the fields. This might be desirable for the following fields 200, 210, 212, 215, 340, 350. If any of these fields is repeated within a record, the first occurrence is used for generating the short display in the search result set. To change the order of the fields, you can drag and drop a field with the mouse upwards or downwards into a new position. Once you've made your decisions and unchecked or checked the non‐duplicates and duplicates, you could change the order of the fields. This might be desirable for the following fields 200, 210, 212, 215, 340, 350. If any of these fields is repeated within a record, the first occurrence is used for generating the short display in the search result set. To change the order of the fields, you can drag and drop a field with the mouse upwards or downwards into a new position.
 resources/cerl_thesaurus/editing/newinterface.txt · Last modified: 2023/12/11 15:13 by jahnke

 

 

Recent changes RSS feed Valid XHTML 1.0 Driven by DokuWiki