lundi 26 octobre 2015

Why is a pdf checked checkbox exporting a null value?

I am reading a PDF file in ITextSharp and want to know whether or not a particular checkbox is checked.

I tried the following code (mainly inspired from this question) to get the value and was surprised to find the value of the check box is null, when in a pdf viewer it is clearly checked:

enter image description here

PdfReader reader = new PdfReader(@"C:\form.pdf")
using (StreamWriter writer new StreamWriter(@"C:\scannedform.txt", true))
{
  AcroFields pdfFormFields = famsReader.AcroFields;
  foreach (KeyValuePair<string, AcroFields.Item> kvp in pdfFormFields.Fields)
  {
    string fieldName = kvp.Key;
    string fieldValue = pdfFormFields.GetField(kvp.Key);
    writer.WriteLine("Field Name:       " + fieldName);
    writer.WriteLine("Field Value:      " + fieldValue);
    writer.WriteLine("Field Type:       " + pdfFormFields.GetFieldType(fieldName));

    String[] appearanceStates = pdfFormFields.GetAppearanceStates(fieldName);

    foreach (var state in appearanceStates)
    {
      writer.WriteLine("Field Options:    " + state);
    }
    writer.WriteLine("");
  }
}
reader.Close();

Outputs this text:

Field Name:       My Checkbox
Field Value:      
Field Type:       2
Field Options:    Off
Field Options:    On

After verifying from the debugger, I used the export form data feature in Foxit's pdf viewer and it is still showing a null value:

<</T(My Checkbox)/V/Off>>

Why is the Checkbox null/Off? Has the check mark been flattened as an image? Is my only option now to look into using an OCR?

Update: When I reset the form in a viewer, the check mark does go away, but the question remains, why is it null in code? Where is the check mark value stored?




Aucun commentaire:

Enregistrer un commentaire