Bar Codes 101
Introduction to Bar Codes
Bar Codes, and to a lesser extent Radio Frequency Identification (RFID) Tags, are almost ubiquitous in the modern laboratory, and for good reason. They can serve many purposes, from simplifying sample identification to the communication of information between systems. However, if someone asked us to explain how a bar code worked, once we got past the superficial, most of us would be at a loss. Despite having greatly increased laboratory productivity and reduced the error rate in many laboratory functions, they are not a silver bullet for correcting sample identification and data transfer issues. To avoid shooting ourselves in our respective feet, or worse, it behooves us all to understand how bar codes work. This allows us to identify their weaknesses and to understand how to find the most value in their strengths. In this column I will attempt to provide an introduction to bar code use. In future columns we’ll take a more detailed look at how various bar codes with different symbologies are generated and used.
A bar code symbology is basically a description of a specific bar code. This includes both how it appears and how it is generated. You can think of the symbology as being analogous to a typographic font, but it extends beyond that, as many bar code symbologies include various error detection and error correction properties as well. The symbology also specifies the type of information that can be encoded. Some symbologies are numeric, while others can contain alphanumeric information, though sometimes with restrictions as to case.
Sample identification is the reason most frequently given for using bar codes in a laboratory. This is a bit of an oversimplification. One of the primary reasons for bar codes is that they are more easily machine readable than standard numerals and letters, but samples can be identified with human-readable numbers as well. The benefit of automated instrumentation is that there is less of a chance of miss-identification due to human error. An additional benefit is providing linkages between systems that are not fully automated.
An example of this would be where a regulatory organization generated a mailing to a list of clients (many of whom could be considered to be very computer illiterate) that were due to have a subset of various required tests performed notifying them that they needed to submit the specified test results by a given date. If these clients were given the option to have any certified lab they selected perform the tests, it makes no sense to clutter all the independent labs data systems with test requirements for people they will never hear from, nor does it make sense for the lab that the samples are eventually sent to have to manually enter all of the customer data and requested tests. Instead, a bar code could be included in the mailing identifying the clients, the tests to be run, and when the results were required. If they returned that form to the selected lab with the required payment, the lab could just scan the order form to automatically prelog the samples for the designated customer to generate unique sample numbers, schedule the appropriate tests in their system, and generate an order to send the required sample kits out. This eliminates the risk of the customer ordering the wrong tests, the lab shipping the tests kits to the wrong person, or even ordering the wrong tests.
A second example is where a sample may have been received and logged into a LIMS, where a sample number was assigned. However, the analysis of the samples might require that they also be logged into another analysis system, such as a Chromatography Data System (CDS) or clinical analyzer. While it would be possible to just assign an arbitrary sample number to them it is more prudent to log them into these systems using the sample number already assigned by the LIMS, particularly if the analyzer can generate an output file which can just be imported into the LIMS. The best approach is to have the LIMS directly generate the run list for the analyzer, but if that option is not available, the next best approach is to use bar codes, whether on the vial or worksheet, and just scan these numbers directly into the other system.
Another example, as many analysts on the bench are still much more comfortable working with printed worksheets over direct entry into a data system, is to print all worksheets with the sample number in both a human readable and bar code format. The analysts can then use these worksheets for confirming and setting up their samples, as well as recoding all of their observations and intermediate results. Once the analysis of that sample batch is completed, the analysts can then scan the bar code to retrieve the appropriate sample in the LIMS for data entry. This does nothing to help ensure that the analyst does not transpose any information that they enter, but it reduces the chance that they might retrieve the wrong sample by transposing digits in the LIMS number.
![]() |
Example of Ogham alphabet used by 5th Century Celtic Tribes. (Barcode – 5th Century AD) |
Now that we’ve taken a look at some of the reasons for using bar codes, perhaps we should step back and take a look at exactly what a ‘bar code’ is. As we examine the 400+ bar codes currently in use, you might find that this is not quite as easy a question to answer as you might have thought.1 Work on machine readable printed codes, as opposed to punched cards, has been going on since at least the late 1940s and they have been used in industrial applications since the early 1960s. However, one might be forgiven for failing to recognize some of the symbologies produced by what we so cavalierly refer to as ‘bar codes’. The key to these codes is that there is some property of symbols used whose variation can be detected. While we might be used to thinking of them as vertical black bars on a white background, this is a bit of an oversimplification. Other properties might be the surface roughness, color, reflectivity, etc. Additionally, before we get too impressed with our cleverness, it might be wise to remember that Ogham alphabet, used by 5th Century Celtic tribes in Ireland, bears a lot of similarity to what we commonly think of as a bar code!2
One of the earliest working examples is the Sunburst Code developed by Charecogn Systems3 for the U.S. Department of Agriculture. This was a radial circular symbology incorporating a disk broken down into 10 equal angular sectors with each sector composed of eight radial segments. The pattern on these segments is such that each segment can encode the numbers 0 through 9, a unique start symbol, and two ‘surplus’ codes. The patterns were chosen so that the start code remains uniquely identifiable, no matter what combination of other codes is used. Its radial nature, combined with the unique start sector, allows this code to be read from any angle.
A different take on these circular symbologies is represented by the Bull’s Eye Code introduced for supermarket check-out by IBM in 1971. It consisted of multiple concentric circles with the data encoded in the variable thickness rings and spaces. While this had the advantage of being able to scan it with a linear scanner from any direction, as long as you were careful to ensure that the scan cut through all of the rings, it proved unworkable for the time due to the tendency for the ink to smear. However, when it comes to technology it seems that nothing is ever completely dead, as this concept appears to undergo a periodic Renaissance. An example from the early 90s is called SureShot.4 In function it is very similar to the Bull’s Eye Code, but is based on the popular Code 39 symbology. In this case, any line cutting through the central circle will produce the same pattern scan as would a Code 39 bar code.
![]() |
ArrayTag Symbol |
Some of the more esoteric designs include the ArrayTag, developed by Dr. Warren D. Little of the University of Victoria and commercialized by Array Tech Systems of Canada5. Visually, it is composed of hexagonal cells with complementary borders containing a hexagonal array of dots. Its irregular cell design means that cells can be nested together to enhance data capacity of the system with no quiet zones required. Its symbology is very forgiving regarding the scanning of uneven surfaces or from oblique angles and can be read from a distance of 50 meters. This tolerance built into its design may be due to the fact that it was originally designed for the forest industry to identify logs.
Another visually interesting symbology is the Snowflake Code6. A proprietary code developed by Electronic Automation Ltd., this code is widely used in the pharmaceutical industry. Like the circular codes, it can be read from any angle and supports sufficient data redundancy that it is still readable with 40 percent of the code damaged.
If you really want to encode a lot of information onto a document, whether obviously or subtly, you should investigate Xerox’s DataGlyph technology.7 Consisting of nothing more than forward slash (‘/’) and backslash (‘\’) characters, potentially shorter that 1/100th of an inch, depending on the printing and scanning technology. With these small sizes, large blocks of information may be incorporated so that they appear to be no more than shading for other graphic elements. According to Xerox, with this code size, the entire Gettysburg Address would fit on a DataGlyph the size of a small US postage stamp. This symbology allows you to specify the amount of error correction data you want to include. When set appropriate, the full message can still be readily recovered, even after it’s been stapled (not to mention being folded, spindled, and other wise mutilated).
![]() |
Postal Numeric Encoding Technique (POSTNET) sorting code |
![]() |
Canadian PostBar |
Of course, not all bar codes are quite so esoteric, some of them even look like what we think of as bar codes, though they don’t necessarily work the same way. For example, the United States Postal Service (USPS) has developed what they refer to as the Postal Numeric Encoding Technique (POSTNET) sorting code8 for embedding ZIP Code information. Other countries use somewhat similar codes, such as PostBar,9which is used by Canada Post. Unlike commonly encountered bar codes, such as Code 39, these codes use a bar’s height, rather than their horizontal position or width, to encode their information. This approach is sometimes referred to as a ‘height modulated symbology’ in contrast to a ‘width modulated symbology’.
From there, it is just another short step back to the symbologies that we typically think of when someone says ‘bar code’. Yet there is even more variation than most people might suspect. I won’t try to cover all of these factors in detail here, but will provide some general examples, so please keep in mind that due to space constraints some of these examples could be considered oversimplifications.10 Within width modulated symbologies codes can be subdivided into discrete and continuous. As the name suggests, with a discrete code every character is independent of its surrounding codes and is separated from them by what is referred to as an intercharacter gap. The size of the intercharacter gap allowed varies with the specific symbology, but is generally a loose value. The impact of this is that the intercharacter gap can vary between one character and other. The practical effect of this is to broaden the types of printing technology that can be used. For example, you could use a classic movable type printing press or even a typewriter-type mechanisim and print all of the bar code lines of a character at once. One of the implications of this is that the beginning and end of each code character must be explicitly defined. For this reason, the character encoding for discrete symbologies requires that each code begin and end with a bar.
In contrast, with continuous symbologies the start of one character is used to define the end of the previous one. Basically, this means that continuous symbologies start with a bar and end with a space. One of the implications of this is that continuous symbologies tend to be more compact than similar discrete symbologies, as they eliminate the required intercharacter space.
To add yet another variable, different width-modulated symbologies may use a different number of character widths. Typically these are classified as employing either two element widths (basically wide and narrow) or multi-element widths, though I think this classification is rather artificial. The major positive impact of using more than one width is that you can increase the information density in your code. The more different widths you use, the higher this density can be. The major negative impact is that once you start using more than one width the ratio of those widths and the tolerances applied to the widths, both in printing the codes and reading them, becomes critical, since if you can’t tell the difference between one bar and another, your code is worthless. The greater the number of widths used, the tighter these tolerances have to be.
While it is possible to devise codes where the width of a bar varies but the space between bars is a constant, in many multi-width symbologies the width of the spaces can vary as well. If only to simplify decoding, most multi-width symbologies are designed to be modular. In a bar coding context, this means that the coding for a character is assigned a fixed length consisting of an integer number of equilength ‘modules’ with all components of the symbology, whether bars or spaces, consisting of an integer number of modules.
An additional classification criteria is whether a code in interleaved or not. With an interleaved code, each bar code character actually encodes two values. The first character is encoded by the width pattern of the bars, while a second character is encoded by the width pattern of the spaces.
There are a number of other factors in relation to bar codes which together determine how error resistant they are, how much data overhead they consume, and their data density. Among these are:
- Start and Stop Characters, which respectively indicate the start and stop of a bar code. Some codes, such as Code 39, use the same character for both. Other symbologies use distinct characters, which might also be used to indicate the scanning direction.
- Bi-directional, which indicates a code that can be scanned from either direction. The need to know the direction of scan is obvious, as the same code group will decode to two entirely different things, depending on which direction you use to process it.
- Self-Clocking indicates that the code is constructed in such a way that it provides a time base for resolving the size of a code’s modules. Without this information, it is impossible to determine what a code is, as you would have no information to tell you whether a bar was one, two, three, or even more modules wide.
- Check Character or Check Digit refers to one or more characters embedded in a symbol to allow a bar code scanner to confirm that a valid read has been made by matching it to the results of a mathematical function applied to the rest of the decoded characters.
- Error Correction refers to additional information embedded in the code which allows a scanner to recreate missing data from a damaged code. The amount of error correction information embedded in the code determines how much damage can occur to the code before it can no longer be recovered. However, the trade-off here is that the more error correction information embedded in the code, the less unique data can be contained in the given code.
The following codes are illustrative of what we commonly picture when we think of bar codes and provide examples of the terms just provided.
- Code 39, also known as Code 3-of-9. Consisting of five bars and four spaces, of which three are wide and six are narrow, making it a fixed-width discrete code. Each symbol begins and ends with the asterisk start/stop code. While the ‘native code’ will only encode the numerals 0-9, the upper case letters ‘A’-’Z’, and a small set of symbols, an enhanced version of the code can encode the full set of 128 ASCII characters, including the lower case letters ‘a’-’z,’ by using some of the symbol characters as shift, or ‘precedence’, codes. However, this only works when the scanner is also programmed to use the full ASCII mode.
- Code 128. This code is a high density continuous symbology supporting the entire 128 ASCII character set. Each character consists of 11 modules composed of three bars and three spaces. It incorporates three different character sets, selected by the use of appropriate start and shift characters. It includes an explicit check digit for use in validating the symbol decoding.
Beyond these width-modulated codes are two other types that you might encounter. The first type might be considered an adaptation of classic width modulated codes and are sometimes referred to as two-dimensional (2-D) stacked symbologies. The primary impetus for their development is the attempt to overcome the issue of overly long bar code strings for symbols containing a lot of information and to avoid the scanning issues that went with them. Examples of 2-D stacked symbologies include the following:
- Code 49. Recognized as the first stacked bar code symbology, this code consists of two to eight rows with eight characters per row. Each row is scanned individually, in any order, with the rows being reassembled automatically in the scanner.
- Code 16K. Data characters are encoded using a negative version of the Code 128 patterns (i.e. white bars and black spaces) and it allows up to 16 stacked rows with five data characters per row.11
- PDF417. This is both a high-density and high-capacity symbology, which is reflected in the PDF of its name, which stands for Portable Data File12 (not Adobe’s Portable Document Format). It is quite different from the above stacked symbologies in appearance, as it does not include any separator bars. The encoding of each character consists of four bars and four spaces and it allows 3 to 90 rows and up to 30 columns of data.13 It compensates for the lack of a separator bar by using a different encoding scheme for each row, which it cycles through every three rows. A single PDF417 symbol can contain over 1,000 data characters and by using the ‘Macro PDF417’ concatenation scheme, up to 99,999 symbols can be linked.
Beyond 2-D stacked symbologies are two-dimensional matrix symbologies. The amount of data that these symbols can hold varies with their design, but they generally can store the data at higher densities than other symbologies of the same physical size. In addition, many of them offer superior data correction capabilities to those available in other symbologies. Examples of 2-D matrix symbologies include the following:
- DataMatrix. This code can contain anywhere from a few bytes bytes of data up to 2,335 alphanumeric characters.14 The solid ‘L’-shaped boarder is used to identify and orient the code and is called the ‘finder pattern’ while the alternating boarders on the opposite sides, called the ‘timing pattern’, provide clocking information, as well as identifying the number of rows and columns. The maximum amount of unique information a symbol can contains, depends on the amount of error correction data included.
- Code One. This was the first matrix symbology to be released in the public domain and was invented in 1992 by Ted Williams.15
- QR Code. Short for Quick Response Code, the QR Code was developed in 1994 by Toyota subsidiary Denso-Wave.16 While initially used to mark automotive parts, it is rapidly gaining popularity as a way to provide input information to camera equipped smart phones. It comes in a number of fixed capacity sizes and can hold up to 2,953 Bytes of information, which translates to a maximum of 4,296 alphanumeric characters or 7,089 numeric characters. Due to its error correction characteristics, you will frequently find this code modified from the specifications format to include logos and other features to draw attention to them. We’ll take a closer look at these in a future column.
One of the problems with using bar codes is that, just like reports from computers, people have a tendency to believe them unquestioningly, as if, since it’s a machine, it can’t possibly make a mistake. Well, whether that’s true or not, the people that made the machine certainly could. Though to be truthful, most bar code errors tend to be due to damaged or misprinted labels. But I’ve seen enough misread good labels to know that you can have errors occur for other reasons as well. Whether its operator error with the scanner or a bad software routine in the scanner, you’ll have bad data coming out. I’ve seen multiple instances of bar code printers randomly printing corrupt labels as well. It’s always disconcerting to discover that the sample pulled back when you scan the bar code is not the same as the one from typing in the human readable number. For that reason alone, I’d encourage you to study up on the symbology of any bar codes that you are using. You might not be able to decode a bar code by hand as rapidly as a scanner can, but being able to do so can tell you an awful lot about where the problem in your system is occurring. When it comes time to validate your systems, a lot of people don’t bother putting much time and effort into validating their bar code equipment, after all, what could possibly go wrong? Well, Murphy and his laws seem to just be waiting for attitudes like that, so anytime you start to feel complacent, you should treat that as a wake-up call and take another look around.
![]() |
Bar Code Building in St. Petersburg |
I’ve dealt with a lot of bar code printers, a brand and model that a lot of people would swear was as solid as a rock — and over all they seemed to be — except for a batch manufactured over a two day period with a different firmware version. The error detection and correction codes designed into many of the newer bar codes have helped minimize this problem, but they can only do so much. If a bad number is fed into a bar code printer, it may well generate a label with a perfectly valid bar code, including all of the check digits; unfortunately it is of the wrong number, so there is nothing that the detection circuits in the scanner can do to detect it! So yes, proper validation technique says that you should revalidate your system whenever something, such as a printer’s firmware changes, but many times you’ll find that nobody bothered to let you know anything changed, so you still have to be proactive to hunt these issues down.
In conclusion, let this photo of the Shtrikh kod (Bar code) building17 in St. Petersburg, Russia, serve as a reminder to you. For it reflects the importance that bar codes have gained and the risks you assume if you don’t monitor their use. For if you look a bit more closely at the building, no matter how much it might resemble a UPC code at a glance, the pattern’s not a valid code and the human readable number does not match the code or the symbology specification. So take the time to make sure you understand how the symbologies you are using work and the type of data that they can contain. In future columns we’ll take a closer look at some of these codes and how to use them.
1. Barcode Learning Center – Barcode Symbologies and Barcode Fonts. SystemID Warehouse – Bar Code Learning Center at <http://www.systemid.com/barcode_learning_center/symbologies.asp>
2. Barcode Applications. Barcode Gulf at <http://www.barcodegulf.com/technology56.asp>
3. Torrey, B.M. MACHINE READABLE CODE TRACK – Charecogn Systems, Inc. at <http://www.freepatentsonline.com/3636317.html>
4. Palmer, R.C. The Bar Code Book: Reading, Printing, and Specification of Bar Code Symbols. (Helmers Publishing: Peterborough, NH, 1995).
5. Barcode Education?» ArrayTag. ROSISTEM Bar Code?» Barcode Education at <http://www.barcode.ro/tutorials/barcodes/arraytag.html>
6. Harmon, C.K. Reading Between The Lines – Chapter 6, page 2. Q.E.D. Systems at <http://www.qed.org/RBTL/chapters/ch6.2.htm>
7. Xerox – USA – XSIS: DataGlyphs. Xerox Special Information Systems at <http://www.xerox.com/Static_HTML/xsis/dataglph.htm>
8. What is a POSTNET Barcode and How Can You Decode It? – Developer Zone – National Instruments. National Instruments at <http://zone.ni.com/devzone/cda/tut/p/id/5027>
9. PostBar – Wikipedia, the free encyclopedia. Wikipedia, the free encyclopedia at <http://en.wikipedia.org/wiki/PostBar>
10. BARCODE SYMBOLOGIES. Bar Code Island at <http://www.barcodeisland.com/symbolgy.phtml>
11. Barcode Online Reference – CODE 16K. Teklynx Newco SAS at <http://www.teklynx.com/barcodes/article_46.html>
12. PDF417 – Wikipedia, the free encyclopedia. at <http://en.wikipedia.org/wiki/PDF417>
13. PDF417 Barcode FAQ and Tutorial by IDAutomation®. IDautomation at <http://www.idautomation.com/pdf417faq.html>
14. Data Matrix – Wikipedia, the free encyclopedia. Wikipedia, the free encyclopedia at <http://en.wikipedia.org/wiki/Data_Matrix>
15. Barcode Education?» Code 1 (Code One). ROSISTEM Bar Code at <http://www.barcode.ro/tutorials/barcodes/code1.html>
16. QR code – Wikipedia, the free encyclopedia. Wikipedia, the free encyclopedia at <http://en.wikipedia.org/wiki/QR_Code>
17. Shtrikh kod, Vitruvius and sons | St. Petersburg | Russia | MIMOA. at <http://www.mimoa.eu/projects/Russia/St.%20Petersburg/Shtrikh%20kod>
John Joyce is a laboratory informatics specialist based in Richmond, VA. He may be reached at [email protected].