Our customers sends us orders as PDF forms which is generated from a Word document built with legacy forms.
Currently people at our customer center is punching the orders into our system, but we have decided to try and automate this task.
I'm able to read the content of the PDF with a simple PdfReader per page:
public static string GetPdfText(string path)
{
var text = string.Empty;
using (var reader = new PdfReader(path))
{
for (var page = 1; page <= reader.NumberOfPages; page++)
{
text += PdfTextExtractor.GetTextFromPage(reader, page);
}
}
return text;
}
But not the checkboxes...
I am able to detect the checkboxes as dictionaries while running through every object in the PDF, but I'm unable to distinguish them from other objects or read the value...
public static IEnumerable<PdfDictionary> ReadCheckboxes(string path)
{
using (var reader = new PdfReader(path))
{
var checkboxes = new List<PdfDictionary>();
for (var i = 0; i < reader.XrefSize; i++)
{
var pdfObject = reader.GetPdfObject(i);
checkboxes.Add((PdfDictionary) pdfObject);
}
return checkboxes;
}
}
What am I missing? I've also tried reading the AcroFields, but they're empty...
I have uploaded a sample PDF with legacy checkboxes here.
Currently there is not option to integrate between our systems or do any changes to the underlying PDF or Word document.
Aucun commentaire:
Enregistrer un commentaire