Koha with no barcodes

Traditionally, Koha 3 depends on the items (we call them existencias in spanish) having a barcode in order to uniquely identify each item. Circulation, for example, requires the librarian to scan the barcode of an item in order to circulate it.

At times, this proves inconvenient since lots of biblios (titles, or títulos in spanish) have the same barcode printed on each item (usually the ISBN number) forcing the library to print new unique barcodes (Koha has a nice barcode generator) for each one of the items in existence.

However, it’s usually not feasible to relabel all items with new barcodes, especially if you have millions of items nationwide. So, I thought of an easy patch to Koha that allows to circulate items based on the item number, and not the barcode.

First of all, you should set the barcode number for each item equal to the item number for those items where you don’t have any barcode recorded. These is best accomplished after loading MARC records on the database using the MySQL console:

  UPDATE items SET barcode = itemnumber; -- optionally using something like WHERE barcode = ''

On my case, for over 1.1 million items, it took some 3 minutes 6 seconds to complete. There’s a drawback, however, because you need to run this periodically as you add more items, but it’s not something your DBA can’t automate. At this point you can circulate items using items number, and you can print barcodes with that number, but it’s still not easy for the librarian to either remember the item number or look it up before circulating.

You can apply an easy patch on line 44 of the modules/catalogue/moredetail.tmpl file of the Intranet, providing a new link on the Items tab of a biblio to start the borrowing workflow for a specific item:

<!-- TMPL_UNLESS NAME="issue" --><a href="/cgi-bin/koha/circ/circulation.pl?barcode=<!-- TMPL_VAR NAME="itemnumber" -->">[Circulate item <!-- TMPL_VAR NAME="itemnumber" -->]</a><!-- /TMPL_UNLESS -->

Of course, circ/circulation.pl on the Intranet also needs a small patch to store the barcode number on the session and then reusing it when the borrower is selected, near line 111:

my $barcode;
if ( $session->param('barcode') ) {
  $barcode = $session->param('barcode');
  $session->clear('barcode');
} elsif ( $query->param('barcode') ) {
  $barcode = $query->param('barcode') || '';
  $session->param('barcode', $barcode);
}

$barcode =~  s/^\s*|\s*$//g; # remove leading/trailing whitespace
...

Restart your Web server and that’s it. You can now search for a biblio, go to the Items tab, select an item to be circulated, select a borrower, and the item is circulated. For returns, search for the user and go to the end of the page, you can see all items on circulation, fines and return options. The workflow changes a little bit, but it’s the easiest way I’ve devised to operate a Koha ILS when barcodes are absent or outside your control.

Considerations for migrating CDS/ISIS databases to fully MARC-based ILSs

CDS/ISIS is an obsolete information storage and retrieval system (and also an information storage format) for computers designed some 30 years ago, filling a need for libraries around the world. For several years UNESCO unfortunately invested time and money supporting it and freely (as in free beer, but as proprietary software) distributing it to several countries. Altogether, CDS/ISIS is now responsible for the overall underdevelopment of technology for libraries, especially in Latin America. Sadly, since UNESCO now seems reluctant to continue draining resources, there is an effort in LatAm to open-source CDS/ISIS-related technologies and bring them to the Web. Fair enough, but this doesn’t change the fact that CDS/ISIS is dead.

So, since it’s already dead, we’ll need to retrieve and migrate our records in CDS/ISIS databases and move them to less ancient systems. Talk about safeguarding our heritage. MARC is an equally ancient format designed by Library of Congress that is actually the standard (ISO 2709) for storing bibliographic records (CDS/ISIS never was) and the flavour we use in LatAm, MARC21, is a binary storage format. But of course we do have MARC-XML which is widespread in Integrated Library Systems, both proprietary and open source. In Koha3 we use MySQL to store MARC-XML when representing a bibliographic records. Specialized open source software such as Zebra allows us to efficiently index and search MARC-XML data.

Perl is the natural language of choice for migrating this kind of data, and there’re libraries for both ISIS (Biblio::ISIS) and MARC (MARC::*) which are already available in Debian, BTW. The following are some caveats I’ve found when migrating data from ISIS to MARC-XML:

  • “Indexing”. Records in CDS/ISIS are referred to using the MFN (master file number) which is a sequential integer asigned by CDS/ISIS; this is useless since end users (patrons) won’t search the catalogue using the MFN, and librarians would like to refer to a single item using a call number. In MARC you don’t have a unique number to refer to records. The whole logic of the MARC::* modules eases understanding, you create an object -your record- containing objects -fields- which you dump in the screen or in a file, all cat’ed together. MARC is a format. Indexing is not the format’s issue.
  • Encoding. Given an MFN, Biblio::Isis throws you a hash. With little manipulation, this hash can be used to create a MARC::Record. So, if librarians have been using MARC fields and subfields in their CDS/ISIS database, migration can be of little logic (search for isis2marc in Google) — however in my scenarios encoding is always a problem. I’d like to cite two of these scenarios: one having a source encoding of cp-850, requiring me to disassemble and then reassemble the whole data structure of Biblio::Isis to create a properly utf-8 encoded record; and the other one having binary garbage coming from a mainframe, where vocals with tilde (spanish) were preceded by a hex 0×82, except for n with tilde, preceded by a hex 0×84 (ibm437) and I preferred to use sed before running my code.
  • Holdings. CDS/ISIS doesn’t implement any logic about your holdings (also called items, or existencias in most spanish-speaking countries) but it might store information about them such as location and number of items. You’re forced to implement custom logic here, since not only your source is picky regarding holdings, but your target will, too. Nowadays, ILSs are expected to be tweakable regarding which MARC field is used for holdings. Koha does use the 952 field.
  • Data quality. In the broadest sense of the term, you’d like to delete multiple space characters, maybe even build a thesaurus, skip undesirable subfields (indicators, subfields under 10) and such. You’ll need custom logic and also disassemble Biblio::Isis data structures. Data::Dumper proves noteworthy for this.

Such procedures are not specifically CPU- or RAM-intensive (I can migrate tenths of MBs of data in my laptop in under two minutes while having a full-fledged desktop running), but they are not instantaneous. With a migration logic which is quite profuse, goes deep inside Biblio::Isis, does decoding/encoding, queries external hashes and so, I roughly get a 390 records per second performance. But this is blazingly fast when compared with the time a modern ILS takes to bulk import a huge amount of records (Koha’s bulkmarcimport.pl gives me some 15 records/second) or when comparing with a state of the art indexer such as Zebra (similar times)

Este blog refleja única y exclusivamente mis opiniones, y no las de mis empleadores, las de las organizaciones de las que formo parte ni las de ninguna otra persona natural o jurídica, pública o privada, nacional o extranjera.