Litigation Support Scanning Technical SpecificationsThis document outlines the technical specifications we utilize for litigation support scanning and file output.
Media Labels
The following information willbe visible on the CD or DVD:
| Required Fields | Sample Values and Examples |
| Vendor Name | ACME Scanning and Coding |
| Vendor Address | 123 Main Street |
| Vendor Phone | 555) 555-1212 (voice) / (555) 555-1213 (fax) |
| Date of Media Creation | 12/02/2003 |
| Format Type | CD, DVD |
| Volume Name | Examples are "FER001, FER002, FER003" |
| X/Y | Examples are "1 / 3", "2 / 3", "3 / 3"; or just "1 of 1" |
| Bates Ranges | Examples are "FER000001 - FER001300" |
| Client-Matter Number | Example, "320123 - 00123" |
| Image Count | 13,000 TIF images |
16-Bit vs. 32-Bit
Older computer software could only use filenames and folder names of very limited length. This is known as the 8.3 naming convention. If a filename is wider than 8 characters, 16-Bit programs truncate the name. So suddenly the filename "AMURPHY0000001.TIF" becomes "AMURPH~1.TIF". Many vendors are using older software that restricts the filenames. As such, they can create a file named 0000001.TIF but not AMURPHY0000001.TIF. National Scanning utilizes current processing tools and is capabable of outputting 32-bit (long)file names.
File and Folder Names
- Only the characters A...Z and the numbers 0...9 are valid
- Filenames should be unique, matching the image key
- Image folder names should be zero-padded to 3 wide (i.e. 001, 002, 003, 004...)
NOTE: The filename must match the image key. The only exception is where the image key contains additional characters that must be echoed in the .TIF file name.
Opticon ".LOG" load file example:
| Database Image Key | Cross Reference Image Key | File Path To TIFF | Actual Filename |
| A001 | A001 | D:\A001\IMAGES\001\A001.TIF | A001.TIF |
Volume Names
Each CD should conform to the same standard: [PROJECT NAME][999]. So, if our project is named SMITH, the first three CDs delivered should be named: SMITH001, SMITH002 and SMITH003. Note the zero-padding. Unless the project name is "VOL", the volume name of the first CD should never be "VOL001". Many applications use the name "VOL" as a default value. This has resulted in many CDs named "VOL001". This can make identification of the related case and content difficult when the Firm has 2,000 CDs named VOL001. Do not use the vendor name as the volume prefix. Use project name as the volume name. The Bates prefix can be an acceptable project name as the volume prefix, but must confirm with the firm as to final decision. Vendors should never use their company name as the prefix. Some firms prefer to use the client-matter number as part of the project name.
CD Content and Organization
Each CD should contain the same folders each time. This structure is important, as the media is not copied to a single subfolder. Instead, "data" goes under a different folder tree than "images". If not segregated, Litigation Support will have to perform this separation.
| D:\[VOLUME NAME]\ | Your CD should have a root folder, named the same as the
volume name. |
| D:\[VOLUME NAME]\IMAGES\ | All images and image subfolders reside here. |
| D:\[VOLUME NAME]\OCR\ | This folder contains multi-page ASCII text files. The filename matches the "BegBates" key, e.g. A001.TXT, A011.TXT, and A013.TXT. |
| D:\[VOLUME NAME]\DATA\ | All "load", "database", "structure" and technical files reside here. |
| D:\[VOLUME NAME]\PROJECT\ | 1. Document coding instructions,
2. Project manuals,
3. Vendor contact information,
4. Source information,
5. Ranges information.
|
| All native files reside here, as applicable. |
NOTE: Each CD must be self-contained. This means a CD containing A001...A010 must contain the images, database load file, OCR and cross reference file for A001...A010. A delivery of CD01...CD10 should have the load files for CD01 on CD01. Having load files for CD01...CD10 all reside on CD10 is incorrect. If the firm loses the "load files" CD, the corresponding CDs may not be usable. Further, this means tracking down 2 CDs every time there is a problem.
Organization of Sub-Folders
We understand that certain applications construct subfolders automatically in different configurations from that listed below. Therefore, this storage convention may not be possible for your organization without unreasonable effort. Standard sub-folders for each delivery are: Images, OCR, Data, Project and Attach.
Images Folder
| FOLDERS | CONTENT |
| D:\[VOLUME NAME]\IMAGES\0001\ | IMAGE0000001.TIF...IMAGE0001000.TIF |
| D:\[VOLUME NAME]\IMAGES\0002\ | IMAGE0001001.TIF...IMAGE0002000.TIF |
| D:\[VOLUME NAME]\IMAGES\0003\ | IMAGE0002001.TIF...IMAGE0003000.TIF |
| D:\[VOLUME NAME]\IMAGES\0004\ | IMAGE0003001.TIF...IMAGE0004000.TIF |
| D:\[VOLUME NAME]\IMAGES\0005\ | IMAGE0004001.TIF...IMAGE0005000.TIF |
NOTE: Zero-Padding is very important. This is especially true if you have 9999 subfolders. One of the reasons why it is important to have a standard number of images per folder is so that Litigation Support can easily determine where missing images may reside. In this fashion it is also simple to locate the right folder based upon the .TIF filename. Empty or "skipped" subfolders are not acceptable. If there is a folder 1 and 3, there should also be a folder 2.
OCR Folder
This folder contains multi-page ASCII text files named for the image key or the first page of the document. Different document review systems load OCR in different fashions. As such, this document includes organization and formatting considerations valid for every format. To learn the actual technical syntax, please refer to the software section for examples. Regardless of software title, there are attributes common to each software, such as filenames and organization.
While different OCR programs produce different types of output, the most firms require the vendor's product to match the naming conventions and organizational schemas outlined here. The OCR filename must match the complete image key. As most programs load the OCR by matching the image key to the text file, all OCR for the entire document should reside in the single image key text file.
| IMAGE KEY | BEGBATES | ENDBATES | OCR FILENAME | CONTAINS |
| A001 | A001 | A005 | A001.TXT | OCR for A001...A005 |
| A006 | A006 | A006 | A006.TXT | OCR for A006 |
| A007 | A007 | A070 | A007.TXT | OCR for A007...A070 |
All load files and files for loading, regardless of application, should reflect this rule. Note: The Bates names should be 7 numbers wide. They are limited here due to brevity.
Data Folder
All load files, except OCR, reside in this directory. While many vendors automatically include load file versions formatted for every major application, they all reside in the same folder. For the document review systems, there are basically two main load files: database load file and imagebase cross-reference load file. The first contains the discovery bibliographic coding including Bates. The second file is an index that correlates Bates to .TIF file.
Required files:
The following files should be found in every DATA folder of each delivery:
- Database Load File
- Database Structure File
- Imagebase Load File
Database Load File Format
The database should be an ASCII delimited file. It is preferable to use the delimiters appropriate to the application. The first line of the load file should be the field names. This provides the database administrator a level of confidence upon seeing the field names and values line up perfectly in the database.
Database Structure File
The database structure file is an ordered list of every field name, field type and size. In the case of a date field, the size should show format (e.g. MM/DD/YYYY, DDMMYYYY, YYYYMMDD, etc.).
Project Folder
These files identify the project, associated information such as attorney name and all the treatments such as stamping or OCR. These files are extremely helpful when identifying an "orphaned" volume. They are also helpful when an old project begins again after a one-year hiatus.
- Bibliographic Coding Instructions: This document identifies how the database was codified. This document shows which fields were coded and any rules around the codes themselves, such as valid document types.
- Project manuals: As available.
- Vendor contact information: A simple text file that tells us which vendor made the CD and their contact information.
- Source information: If there was an intake form, include it here. What was the source of the project - boxes or electronic discovery? Sometimes we need to backtrack from the produced CD to the originating data.
- Ranges information: Show a list of the Bates ranges on the CD.
Attach Folder
This is the home of native files. If there is a movie clip or spreadsheet associated with the collection, this is where is must reside. Once on the folder, the full path will be:
| FOLDERS | CONTENT |
| D:\[VOLUME NAME]\ATTACH\0001\ | The first 1,000 native files |
| D:\[VOLUME NAME]\ATTACH\0002\ | The next 1,000 native files |
| D:\[VOLUME NAME]\ATTACH\0003\ | And so on... |
| D:\[VOLUME NAME]\ATTACH\0004\ | ...and so forth |
| D:\[VOLUME NAME]\ATTACH\0005\ | |
NOTE: Zero-Padding is very important. This is especially true if you have 9999 subfolders. One of the reasons why it is important to have a standard number of native files per folder is so that you can easily determine where missing attachments may reside.
Bates Schemes
While it is helpful to have a "significant" Bates prefix ("ACME" versus "A"), brevity is not without merit. Use the "KISS rule": keep it simple, Simon. Just think how many times you will need to write or enter the prefix. Also, there are computer issues that almost mandate certain syntax. Please use the following conventions when constructing the
prefix.
We recommend using the project name for the prefix. In this fashion, each collection, project, Bates and files carry the same name. If you have 5 separate discovery collections, you will have 5 projects, 5 unique Bates prefixes and 5 unique filename prefixes. When it is time to produce, a new Bates scheme can be applied.
Guidelines for creating a correct Bates prefix:
- No more than 5 characters wide
- Only use uppercase letters from A to Z
- Bates prefix should not end in "L", "O", "I" or "D"
- No spaces or hyphens between the prefix and number
- Good Bates prefixes include: A, AA, AAA, AAAA, AAAAA.
- Bad Bates prefixes include: 9A, 9A9, A9, A9-, A-A-A, 0-A-0, A 0001.
- OCR may mistake certain letters for the numbers "1" and "0".
- Suffix should be numeric
- Suffix should be zero-padded to four (4) positions, the ten-thousandths place
- Suffix should never contain spaces, hyphens, underlines or characters other than 0 through 9.
Good Bates suffixes include: .0001, .9999, .0100; Bad Bates suffixes include: .A, .0A, .A01, .A-1
Bates Number Examples:
The following two tables will show the identical ranges using proper and incorrect prefixes and suffixes.
Correct Bates ranges:
| BegBates | EndBates | Description |
| A001 | A010 | Prefix is short. Easy to see where prefix ends and |
| A011 | A011 | the next Bates number begins. |
| A011.0001 | A011.0026 | Zero-padded and numeric ensures proper sorting and |
| A011.0027 | A015 | software friendly format. |
Incorrect Bates ranges:
| BegBates | EndBates | Problem |
| A9001 | A9010 | Is the document number 1 or 9001? |
| A9011 | A9011 | Is the prefix A or A9? |
| A9011.A | A9011.Z | Suffix is a letter, resulting in sorting issues. |
| A9011.BA | A9015 | Suffix contains letter(s), resulting in sorting issues. |
| A9016.1 | A9016.10 | Suffix has no zero-padding resulting in bad sorting. |
Data Files
The following files must reside in the Data folder on every delivery by the vendor:
- Database load file,
- Database structure file, and
- Imagebase cross-reference load file.
1. Database Load File:
- Delimiters - Although this document does not truly favor one application over another, the Concordance standard delimiter characters have proven reliable time and again. They are: Comma (020), Quote (254), Newline (174)
- The first line of the database load file should be the field names.
- The name of the database load file should match the volume name.
2. Database Structure File:
National Scanning has a standard database structure we use for all databases (electronic and paper) unless the client requests otherwise. This is a text file showing a sample structure file. The following is just for illustration.
| Field Name | Type | Size |
| Author | Paragraph | |
| Date | Date | YYYY/MM/DD |
| Title | Text | 60 |
| Pages | Number | 3.0 |
Imagebase Load File.
The following are the rules governing a good load file:
- The imagebase load file name should match the volume name
- All images referenced in the load file must be contained on the same volume
- Document breaks
- Page counts
- Image path: D:\IMAGES\[CLIENT#]\[MATTER#]\[DATABASE]\[VOLUME]\IMAGES\...
Note: While the path may seem long, it provides everyone with a standard everyone can understand. The database folder may seem redundant at first. That is until there are 12 databases for a given matter number. At that time, one becomes grateful for the database subfolder. National Scanning uses this structure for our litigation support scanning services for many reasons. When the load file does not match this path, the vendor will have to this. If Litigation Support has to fix this, then the client may be paying twice for the same work.
Sample Opticon Load File:
| [Field 1] | [Field 2] | [Field 3] | [Field 4] | [Field 5] | [Field 6] | [Field 7] |
| A001 | [VOLUME] | D:\[VOLUME]\IMAGES\001\A001.TIF | Y | | | 2 |
| A002 | [VOLUME] | D:\[VOLUME]\IMAGES\001\A002.TIF | | | | |
| A003 | [VOLUME] | D:\[VOLUME]\IMAGES\001\A003.TIF | Y | | | 1 |
Here is an explanation of the Opticon load file format:
| [Field 1] | Production Number | This is a text field which contains the "Production" or "Control" or Bates number for that page of the document. It is a unique value and is the load file "key". |
| [Field 2] | Volume ID | This is also a text field. It should contain the Volume ID of the which the images are delivered. |
| [Field 3] | Full DOS Path | This contains both the path to the image and the actual image filename. |
| [Field 4] | Document Break | This is a text field. If this particular image is the first page of a document, this field should contain a "Y" (Yes). |
| [Field 5] | Folder Break | This is a text field. It's fairly rarely used but if used is intended to work just like Document Break, i.e. it would contain a "Y" if this is the first page of a new folder. |
| [Field 6] | Box Break | This is a text field. Also rarely used but intended to work like Doc and Folder Break...would contain a "Y" if this is the first page of a new box. |
| [Field 7] | Pages | This is a text field although it contains numeric data. If this is the first page of a new document, "Document Break" will contain a "Y" and this field will show the number of pages for the document. |
Each of these fields is "separated", or "delimited", from the others, by a comma. When a technician imports a load
file into Opticon, the content for each field is divided by the commas. Therefore, one can not have a directory
named "\5,312,591 PATENT" since Opticon will view each comma as the start of the next field.
Database Conventions
There are two main categories of discovery: electronic and paper. Electronic discovery software extracts "metadata from the file. The metadata contains fields and values ranging from email subject to the last print date of a spreadsheet. Different file types may yield different types metadata.
This means the firm may need to pay for bibliographic coding for certain kinds of electronic discovery to achieve a complete database. If 20% of a database has no author information, this will impact search results and confidence. All electronic discovery yields "full text". Full text is quite literally all the text inside a word processing or spreadsheet file or any other electronic files. Full text removes the need for OCR. Like OCR, full text does not provide bibliographic coding such as author and recipient. Full text will provide 100% accurate content where paperOCR may be 98% accurate or better, depending on the quality of the paper.
Image Format
The majority of documents imaged only require black and white. On a less frequent basis, we may need color images. The following are our standards:
- Black and White images should be 300 DPI, Group IV TIFF;
- Single page TIFF images;
- Color images should be discussed on a per image or per document type basis;
OCR
We use auto-rotate and voting when generating OCR. Most OCR software offers an auto-rotate option. When auto-rotate is enabled, the software will OCR each image four times, rotated 90 degrees each time. It determines the best result and publishes the content to the load file. The majority of documents have the same orientation: portrait. Without auto-rotate, these documents can yield good results. The rest of the documents may be designed for a landscape layout, such as an HR chart. Other documents still may have been scanned "upside-down", resulting in garbage OCR. OCR voting is a process where multiple OCR programs compare results to determine the best results.
The OCR text should best approximate and recreate the formatting found on the original image. The OCR field should never be just the words in one long string. No text and the top, bottom or either side should be clipped.
There should be a one document to one OCR text file ratio. The OCR filename must match the document image key. So, a 10 page document with the image key of AA001 should have a corresponding file AA001.TXT that contains the OCR for AA001 through AA010. Each page of OCR should have a line identifying the page number, or Bates number. In this fashion, people can search for any Bates number and find the correct document. Please include space between the OCR text and page
marker.
Slip-Sheets or Unitization Rules
If not already done by the client or the firm, the scanning company should place a slip-sheet between each document before scanning. After the documents are scanned, the vendor needs to provide logical document breaks. The Firm requires a 1 document to one database record ratio. Between slip-sheeting during scanning and the logical document breaks service, this ratio should be guaranteed. Slip sheets should not appear in the database or images. The resulting database must maintain the parent-child document relationships through the "BegAttach" and "EndAttach" fields.
When the firm requires the vendor to print documents to paper, a non-white slip-sheet must separate every document. Blue and dark green are the preferred colors for slip-sheets. If the slip-sheets use more than one color please refer to the color blindness specifications.
Label Information
The following information should appear on every label or package:
- Vendor Name
- Vendor Address
- Vendor Phone
- Deponent Name (Last name, First name)
- Dates of appearances (YYYY/MM/DD format)
- Deposition Date (YYYY/MM/DD format)
- Case Name
- Indicate whether synchronized
- Type of "sync" file (.MDB, .CMS, .PTF)
Delivery Media
One can buy a 200GB external USB2.0 hard drive for ~$200. As most vendors charge an average of $25 per CD, any delivery of 10 CDs or more should come on a hard drive. Aside from cost savings, loading from a hard drive saves time. It is much easier and expedient for Litigation Support to copy a single hard drive to the server than to copy 10 CDs.
Thank you for taking the team to review our technical specifications. Please contact us to review your project and with any questions at all.
Some segments of this document copyright (c) 2006 Ad Litem Consulting, Inc. This material may be distributed only subject to the terms and conditions set forth in the document license
Can't find what you're looking for? Contact us directly for more information or use the search box below.
Search:
Thank you for the opportunity to work together.
Contact us Online or call (888) 211-1797 for more information
|