Guide to Editing the CERL Thesaurus
1 The CERL Thesaurus Web Interface
With its migration into the hosting environment, the current editing infrastructure consisting mainly of the WinADH editing client becomes obsolete and can no longer be used. All editing can now done via your web browser. This allows also users of non-Windows operating systems to edit the CT from their computers.
1.1 Logging in
To use the editing facilities in the new web interface you will need a user account for the new hosting environment. The login details for WinADH or the CERL Website will not be recognised by the new system. There is, however, only one account for all databases hosted on http://data.cerl.org ; if you already have an account for the ISTC or MEI, let the database administrator know and they will assign the necessary editing rights to it.
If you do not yet have an account for data.cerl.org, contact the database administrator, who will then set up an empty account for you. You will be notified by email about that and will need to fill in your user details and explicitly activate it, before you can use it.
To log in, click the login button at the bottom of the page, which will take you to the login form. Fill in your username and password and hit “Login”.
Please note that an account is only necessary for editing; searching and downloading records does not require a login.
1.2 Forgot your password
If you ever forget your password, you can request a temporary login that will allow you to set a new password (see below how). Hit the “Forgot your password?” button which you can find at the bottom of the login form and in the form popping up next, fill in either your username or, if you have forgotten that, too, your email address. You will then receive an email with further instructions.
1.3 Maintaining your user data
Once you have sucessfully logged in, your name will be displayed in the bottom navigation bar, in the place where the login button was before. Clicking on your name will take you to the page where you can manage your account.
If you want to change your full name, your email address or set the preferred interface language, you can do that here. To save your changes hit “Done”.
1.4 Changing your password
To change your password, scroll down to the box labelled “Change your password” and enter a new password. Make sure you pick a password that is difficult to guess by others, not your name or the name of your institution, your children, your pet dog, not your birthday or anything like “cerl” or “thesaurus” or similar. The longer your password, the more difficult it is for others to guess or crack.
1.5 Deleting your account
If you want to delete your account, scroll down to the box labelled “Delete account” and hit “Delete this account”. The system will ask you once if you are sure about it, if you confirm that, you will be logged out and your account will be deleted immediately. Please remember that your account in the new hosting environment is not database specific, this means, if you delete your account from the CERL Thesaurus interface, you delete it also for all other database (ISTC, MEI, SBTI, etc).
A deleted account is gone for good and cannot be restored. If you have accidentially deleted your account you will need to request a new one (see above).
Deleting your account will remove all your data from the system. However, the records you might have edited, will retain your (numeric) user id. Once your account has been removed, one cannot know from the user id, to whom it did belong. A user id will never be reassinged to another user.
1.6 Changing the interface language
Changing your preferred interface language on your account page, takes effect only after you log out and log in again. To change the interface language immediately, but only for the current session, click the link labelled “Language” in the navigation bar at the bottom of the page. A menu will pop up, which will let you choose a different language.
Please note that not all databases within the hosting environment are available in all languages. If you select a language which is available for the CERL Thesaurus and then switch to a different database that does not support that language, the system will try the language you've set as “preferred language” in your account settings, if you are logged in, or if that is not available either, the language set as default by your browser (check your browser's help pages how to change this) or if that doesn't work either, it defaults to English.
1.7 Navigation
The new CT interface provides two navigation bars, one at the top of the screen for the functions used by all users, and one at the bottom for the functions used less fequently. In addition to these, there is a menu available from the top navigation bar under the item labelled “More…”. This menu contains all other options availble. According to the access rights associated with your account the items displayed here might be different than for other users.
From the top navigation bar you can access the search results screen (“Search”), the alphabetical index browse lists (“Browse”), your search history (“History”) and your Bookmarks (“Bookmarks”). The bottom bar allows you to access your account page (see above), the main menu of the hosting environment where you can switch to another dataset (“Datasets”), the CERL Thesaurus help section on the CERL Website (“Help”) or the contact details of the CERL Secretariat (“Contact”).
2 Searching the CERL Thesaurus
2.1 Full text search
Probably the easiest approach to searching the CT is to use one or more terms in the full text search field. A full text search in the new environment will typically yield more hits than in the previous system, due to different indexing principles applied to the data. (To let your search mimic the behaviour of the old system, click on “Search for names only” in the right column afterwards; see below for more details.)
As a default your search result will be grouped by the types of entities it contains: Corporate Names, Imprint Names, Places and Persons. Each group will display the first 25 hits and a link to the full set, if available. You may click a heading in these list to access the full record.
Clicking on “Show all … records” will present the records of that set in a list that can be paged through using the buttons at the bottom of the page.
To bookmark a record, click the bookmark icon in the top right corner of the record's full display or the short list that appears when clicking on “Show all … records” . You can access your bookmarked records from the item “Bookmarks” in the top navigation bar.
2.2 Limiting and sorting your search results
If your search yields too many hits, you can reduce the size of your resultset using one of the limiting options in the right column of the result screen.
The option “Search for names only” will limit your search to the headings and variant names fields of a record, thus emulating the search behaviour of the old CERL Thesaurus web interface.
The limit “by feature” option allows you to select those records from the current sets that contain either provenance information, Links to the printers' devices or images (portraits). The number in parentheses indicates the size of the resultset after applying the limitation. You may also limit your result by the gender of the described person (persons and imprint names).
The search results are by default sorted alphabetically. If you want to consult them in chronological order, you may do so by clicking “chronologically (earliest first)” or “chronologically (latest first)” from the “Sort your result” section below the Limit options.
2.3 Using search keys
You may formulate more complex queries using the following search keys in the search box. A search key must always be followed by a colon and the value you are searching for without any blanks. You may also use parentheses and the Boolean operators AND, OR and NOT (note that the Boolean operators must be written in capital letters).
Truncation is possible by adding an asterisk (*) to the search term. To search for phrases, put those in quotation marks (“).
address | search within the address of an entity address:“Kerk-straat” |
corporateName | search only within the names (headings and variants) of corporate entities corporateName:university |
dedup | search for records marked for deduplication (see below) dedup:cnl |
external_id | search for IDs from external files external_id:gnd or external_id:(gnd 1029934118) |
feature | search for records with certain features prdv Printers' Devices prov Provnance Information feature:prdv |
gender | search for persons or printers by their gender a female b male u unknown gender:a |
id | search for a CERL Thesaurus ID id:cnc00006222 |
imprintName | search only within the names (headings and variants) of printers etc. imprintName:sermartelli |
last_changed | search for records last changed at a certain date or within a certain time period last_changed:[2017-12-01 TO * ] all records changed at or after Dec, 1st 2017 |
name | search only within name fields name:hamburg |
note | search only within note fields note:printer |
placeName | search only within the names (headings and variants) of places placeName:hamburg |
personalName | search only within the names (headings and variants) of persons personalName:aristoteles |
record_flag | search for records bearing a specific marker record_flag:ba18 |
related_to | search for records that link to the given record id. There is a number of search keys that allow for more detailled searches for certain relationship types, which are not listed here. related_to:cnl00032270 |
sign | search within the sign/marks/devices fields sign:tortuga |
type | search for a certain record type type:cnp |
year_end | search for entities whose activity or existence ended before, in or after a certain year year_end:>1800 year_end:<1500 |
year_start | search for entities whose activity or existence started before, in or after a certain year year_start:[1530 TO 1560] |
a complete list of available search keys can be found here: Indexing of the CERL Thesaurus
2.4 Working with the Search History
The last 20 searches you have performed (or records you've consulted in full display) are recorded in your search history and can be seen from the item “History” in the top navigation bar. You can go back to a particular search or record by clicking on the query or record id.
The search history is associated with the current session. If you don't do anything in the web interface for more than two hours, logout or if you clear your browser cookies, the session will be terminated and the search history will be no longer available.
2.5 Browsing alphabetical indexes
You may search for names of places, corporate bodies, printers, persons and entities for which provenance information is available by using alphabetical browse lists. They are available from the item “Browse” in the top navigation bar.
Select an alphabetical browse list from the dropdown and type the first few characters of your search term into the field next to it and hit “Browse” to open the list at the desired position. To go to the previous or next section of the list, use the navigation buttons at the bottom of the page.
The number at the left of each entry indicates the number of records that hold the particular name form. Clicking on the entry itself will trigger a search and present the records.
3 CERL Thesaurus Record
The CERL Thesaurus records are accessible from a persistent HTTP URI (URL) containing the id of the record. When requested from a web browser, the record will be displayed as an HTML page. Other applications may set the HTTP-Accept header to a value other than “text/html” to retrieve the data in a different format, for details see the API documentation.
3.1 Types of entities
In the CERL Thesaurus there are descriptions for four different types of entities: Places (primarily places of printing, but also places of manuscript writing or places where persons were active or corporate bodies were located), Imprint names (all names that might appear in an imprint of a book and that do not denote any type of intellecutal responsibility for the content of it, i.e. primarily printers, but also publishers, booksellers, illustrators etc), corporate entities and persons (both can either have had the intellectual responsibility for a book's content or have been owners of books).
The information which type of entity a record is about is hidden in the record ID's prefix. If you use the form-based editor to create a new record, you should select the type of entity from the drop-down labelled “Type of Entity” (2nd from the top) - if you edit an existing record this drop-down has no effect at all (i.e. you cannot change the entity type by just setting it to a new value).
3.2 Headings and Names
The CERL Thesaurus records the preferred name forms of an entity as they are used in CERL Member libraries. Heading forms differ according to the language used for cataloguing and the cataloguing rules applied. The CERL Thesaurus does not declare one of these name forms as authoritative, but lists them alphabetically indicating the institutions that use that particular form. Thus, there is no “CERL form” for a heading. The name form given as a headline in a record's full display is the first heading recorded within that record, its prominent position does not imply any indication of preference for that form.
Variant name forms do now have an additional code, indicating the type of the name form ($0 in 4xx fields). This code is mandatory and defaults to varn
(“variant name”).
For a detailed description of the various fields and the input convention applying see:
See also: the section on complex fields in the form based editor
3.3 Sources
The CERL Thesaurus records three types of sources for the information given in a record: reference works, editions printed/written by the described entity and books owned by the described person or corporate body.
There is no prescribed syntax or a controlled list of abbreviations for reference works in the “Found In” field.
The CERL Thesaurus is neither a bibliographic database nor a provenance database, so editions and copies are recorded only as sources of evidence. Complete lists of works printed, written or owned by an entity can be provided elsewhere (e.g. HPB, HPB-Item, MEI etc) and a link can be made from the Thesaurus record instead.
For a detailed description of the available fields and the input convention applying see:
Sources that apply to a specific piece of information only (such as bibliographic data or an individual variant name form) are recorded in that particular field.
3.4 Notes and Biographical Data
There are four categories of (public) notes in a CERL Thesaurus record:
General notes can hold any information on the described entity that does not fit in any of the other categories.
Biographical data and dates of activity should always hold the dates mentioned in the note also in machine readable form in a separate subfield, which can then be used for sorting records chronologically or filtering for records from a certain time period. Biographical data can also be given as dates of existence of a corporate entity. “Founding dates” or similar of places are not given here, though.
In the form based editor biographical data and date of activities are two separate fields, while in the (Unimarc-based) “CT Internal Format” the same field is used with a different value at the first indicator position.
Any other information about a person's, printer's or corporate body's academic title / degree, domain of artistic expression, intellectual responsibility for a publication, language(s), profession/occupation, religious affiliation, title(s) of nobility, traded items or any other activity is given as an activity note. In an activity note it is possible to use controlled vocabulary. When controlled Vocabulary is used, the corresponding URI is given as well, or, in case there is no URI, the source vocabulary is indicated.
Any aspect of the entity's description relating to geography in any way or form is given as a geographic note. A geographic note can also hold terms from a controlled vocabulary including their URI or a country code. Place records often indicate the country/province/region the place belongs to nowadays as a means of identification or disambiguation.
Unless controlled vocabulary or codes are used, all note fields are free-text fields, which means that the language of the note is always indicated. Note fields also provide a subfield to record the source the information is taken from.
For a detailed description of the available fields and the input convention applying see:
3.5 Related Places
Records for persons, printers and corporate entities can be linked to places where they have lived or have been active in any way. The type of relationship between such an entity and a place can be specified as well, which facilitates more sophisticated search queries (for example for persons that went to school in a given place). It is possible to record also an address together with the place name. In the web interface, related places are plotted on a map, if the place records contain geo-coordinates.
There can also be a relationship between two places. Currently it is possible to describe a place as being a part of another, larger place, for example a village that is nowadays part of a larger city.
There are a number of place name records that do not describe an individual place but a place name that is the name for a number of places. These records are marked as being “used for more than one entity” (code 3 in field 110$a) and can link out to the records for the individual places by that name. This relationship is typed as “see instead”.
If there is no further specification of how a place is related to the entity described in the record, 'act' (place of activity) is used for persons, corporate bodies and printers and 'relp' (related place) is used for places.
For a detailed description of the input convention applying see:
3.6 Other Related Entities (Persons, Printers, Corporate Bodies)
CERL Thesaurus records for persons, printers and corporate bodies can link to other records for any of these entity types to indicate family relationships, predecessors und successors in business, persons being members of a corporate body etc. The type of relationship is always indicated with the link to the target record.
Records for places can only link to other places, not to persons, printers or corporate bodies (while any of these can link to a place record).
For a detailed description of the available fields and the input convention applying see:
In the Unimarc-based CT Internal Format three different fields are used, depending on the type of entity the link points to. In the form based editor there is only one (complex) data entry field for links to persons, printers and corporate bodies. Here the type of the target entity is selected from the drop down list.
3.7 Signs, Marks, Devices
Any sign, image or mark associated with a given entity can be described in the records for persons, corporate bodies and printers. This could be provenance marks, library stamps, printers devices etc. Signs, particularly printers' devices, used together with or instead of an address are recorded together with the place of activity (see above). Links to images of such devices/marks/signs can be provided as well. For a detailed description of the input convention applying see:
Images of the actual entities themselves (e.g. portraits) can be added as well. It is however not possible to upload image data to the CERL Thesaurus server, instead a link to an external resource is set. When doing so, make sure that the intellectual property rights of others are not violated. For a detailed description of the input convention applying see:
3.8 External Resources
Apart from linking to images of printers' devices, provenance marks or similar, there are two types of links to external resources from within a CERL Thesaurus record. One of it can link to any type of resource that is provides further information on the described entity or is relevant in any other way. This kind of link is displayed as “Online Resource” in the “More Information” section.
The other kind of links are displayed in the right column and express a certain type of relationship between the described entity and the external resource. This could be:
- a bibliographic record in a library catalogue (or similar database)
- a search for books owned by the described person or corporate body in an external database
- a portrait/depiction of the described entity
- any entry in an online encyclopedia (e.g. Wikipedia) or biographic lexicon about the described entity
- the entry for the described entity in another authority file
For a detailed description of the available fields and the input convention applying see:
3.9 Geographic Coordinates
Geographic Coordinates should be given for place name records (cnl) only. Please note that the format of the coordinates varies according to the editing form you are using. In the form-based editor and the YAML editor coordinates are given in decimal form as a floating point number without a degree sign (°). Use positive numbers for north and east and negative numbers for soutch and west.
In the CT internal format editor, the coordinates are given in degrees, minutes and seconds preceded by a letter indicating the hemisphere, see 123 Coded Data Field: Latitude and Longitude for details.
Please note that the syntax of field 123 assumes a place is designated by a square-like shape through defining the outermost latitudes and longitudes. In the CERL Thesaurus, the coordinates of a place are that of a single point in or near the center of place - so both coordinates must be recorded twice here.
3.10 Other Formats
Internally, CERL Thesaurus records are stored as JSON objects. From the web interface, the (actual) internal format of a record can be retrieved in JSON or YAML. For compatiblity reasons the interface also supports the old “CT Internal Format” based on Unimarc Authorities as a working format. Data can be entered and edited equally in either the old or the new format (see below).
Additionally CT data can be retrieved in RDF as well (RDF/XML, Turtle or JSON-LD) to use it in Linked Data applications. There are, however, a few pieces of information that are currently not translated into RDF, so RDF description sets are marginally less complete than records in the (JSON-based) internal format or the old (Unimarc-based) “CT Internal Format”.
4 Editing the CERL Thesaurus
At the bottom of every record you will find the buttons that allow you to edit the data in the CERL Thesaurus (if you have the necessary access rights, that is). This follows the principle that things that are important to every user are placed on the top of the screen and items that are of importance only to the CERL appointed editors are a bit more hidden away at the bottom.
Depending on your editing rights, there can be up to three icons:
The rightmost, that shows a small rubbish bin, will delete the record. Please be careful with this one, you will be asked for confirmation (twice, acutally) before the record is deleted, but once you have confirmed that, it is gone for good.
The middle one, showing two sheets of paper, will make a copy of the record and open it in the form based editor. This might be useful if you are creating a number of very similar records and want to save some time typing the same things in over and over. Although, this is probably more of use in the other databases living in this hosting environment than to the CERL Thesaurus.
The leftmost button, showing a notepad and a pencil, will open the current record for editing. There are currently four different editors available, according to your editing rights you will presented a list where you can pick the one that is best for what you want to do with the record.
4.1 Selecting an Editor
If your account has complete editing rights on the CERL Thesaurus, you will be shown the following selection of editing clients. Each of it has its strengths and weaknesses and you may want to choose the client you want to use according to the things you want to edit in the record and according to how close you feel to the Thesaurus's internal format.
4.2 Using the Form Based Editor
The form based editor presents the record in a web form where each data element is in a separate field and where each field has a plain text label. If you are not (yet) very familiar with the CERL Thesaurus format or want to enter a new record from scratch, this editor might be a good choice. On the other hand, the record structure of the Thesaurus is quite complex and records can get rather lengthy, which can lead to enormously long forms that can be difficult to navigate.
The form consists of two main parts: data about the entity that the record describes (the actual “content” of the record) and data about the record itself. Both are folded by default, click on a part to open it.
The part Desription of the Record holds the record status, the internal (cataloguer's) note field, any internal record flags, information about definite, possible and definitely not duplicates and the editing history.
Fields you cannot edit, like the editing history, have a pale grey background.
Fields may contain subfields. In these cases subfields are contained in a separate sub-form, which by default is also folded, so you can only see the fields title. You will recognise these folded subforms by a little black downward pointing error on the left side. Click on the field's title to open the subform. A subform has an elegant grey background to make what belongs together more visible.
A big blue plus-sign on the right of a field means that this field is repeatable and that you can add another field of this kind by clicking on it. Please note that in order to add an additional complex sub-form, you have to open the last sub-form to find the blue plus-button. When adding additonal fields, the order of fields is kept as it is in the form.
When done editing, click the green Save button at the bottom of the form, or Cancel if you want to discard your changes.
See also How to use the form based editor for a more detailed introduction to the form based editor.
4.3 Using the CT Internal Format Editor
The “Internal Format” editor allows to edit the CERL Thesaurus data in the same Unimarc based field format as it has been the case in the old Avanti environment. It's called “Internal Format” since this has been the internal format in the old system and is still used for ingesting and update, although the actual internal format is quite different now (JSON).
The format description and input conventions can be found here. Different from the old WinADH client, the Internal Format editor is used for editing records only - for merging duplicate records there is now a different editor available (see below).
In comparison to the form based editor, the display of the information stored in a record is much more compact and easiert to grasp at first glance. For experienced editors typing field numbers and subfields is probably faster than handling a complex form.
There is also no input validation implemented yet, any invalid fields, indicator positions or subfields will be silently ignored when the data is converted to JSON, so make sure to check from the record's full display if all information has been entered correctly.
4.4 Using the YAML Editor
The YAML editor is the most powerfull of the available editing clients. It allows you to directly manipulate the internal data structure and you should use it only if you know exactly what you are doing.
Internally records are stored in JSON - YAML is a way to represent JSON data structures in a conveniently readable form1), replacing all the curly brackets with indentations (use blanks, not tab stops!) and the square ones with hyphens. Before you are using this editing client, get familiar with the basic YAML Syntax.
For longer records, you can switch the editor to full screen mode by pressing [F11] or clicking the yellow button labeled “Fullscreen”. To return to normal mode press [F11] again or [ESC].
If you are trying to save your record, but nothing happens, you have most likely made an error with the YAML-Syntax. Since the editor does not yet return a validation report, you may try copying the record into an external validator (e.g. http://www.yamllint.com/) to see what went wrong.
4.5 Deduplicating Records
The editor to merge duplicate records is only shown among the options for choosing and editing client if the record holds an indication of possible duplicates (#831 #1
or meta.possibleMatch
). The approach the new interface takes to deduplication is slightly different from the one used within the old WinADH client: Instead of showing two records next to each other, the new client shows how the final record would look like, if all possible duplicates were merged together. This is supposed to make comparison between records easier, since all fields that are supposed to hold the same information are shown in the same spot, color coded by the source record it has been derived from.
To find potential duplicates, search for dedup:
and the record type, e.g. dedup:cnp
for persons. You may combine it with something useful to get smaller resultsets, for example: external_id:gnd AND dedup:cnp
to limit the set to records from the GND file.
The duplicate candidate records are shown in the right column. The first (white) one is the record you started with; since this is the record the other are finally merged into, you cannot deselect it from deduplication.
Please note: If there are many duplicate candidates it might be easier to check the records one by one. To do that, you start with “Hide all” (which turns all candidates grey) and then re‐activate the first candidate and see if it is a duplicate or not. If it’s not, uncheck it and go to the next. If it is, you could click the other button next to the check mark (the one with the three stripes on it) to switch the record’s colour to white before you proceed with the next record. So you’d know that everything that’s white is checked and “ok” and the amount of different colours is a little less confusing.
Un‐select the records that are not duplicates to the first one by clicking the green checkmark in the box that represents the record in the right column. The box will turn grey and the fields that belong to that record will no longer be visible in the left column. As long as the box in the right column has a colour (other than grey, that is) the record will be merged into the first one, once you hit save.
Please note that by unchecking a record, you say that it is definitely not a duplicate to the first (white) one and by clicking the checkmark you say it definitely is. There is no “I’m not sure option” as there was in WinADH – if you are not sure then it is not a duplicate.2)
Please make also sure to check the biographical dates (340), activity notes (350, 300) and titles (291) to decide if records refer to the same entity or not. If that is not sufficient, try also to look at the source systems the records come from and see to what titles they are linked (this is not always possible, of course). Always bear in mind, that you are most likely the person who is saying the final word about these records being duplicates or not. Once you hit save, the records will no longer show up, when someone checks for duplicates. If you say two records are not duplicates they will probably never again be considered for merging and if you say two records are, they will probably never be separated if they are not.
Once you've made your decisions and unchecked or checked the non‐duplicates and duplicates, you could change the order of the fields. This might be desirable for the following fields 200, 210, 212, 215, 340, 350. If any of these fields is repeated within a record, the first occurrence is used for generating the short display in the search result set. To change the order of the fields, you can drag and drop a field with the mouse upwards or downwards into a new position.
When done, hit Save. The first record will receive all the fields of the active (coloured) records, which in turn will get a redirect to the first record's record id. If you have merge some records and later discover that they are actually not duplicates, inform us accordingly at convert@gbv.de, so that we can restore the original situation. Please note that we can only return to the state before merging - any editing that might have been done on the merged record will be lost.