mercredi 8 juin 2022

Select checkboxes in a pdf with Python (PyPDF2)

I'm trying to automate filling this PDF: https://www.sepe.es/SiteSepe/contenidos/empresas/contratos_trabajo/asistente/pdf/temporal/Temporal.pdf (decrypt it with ilovepdf).

I have no problem with the text, but with the checkboxes there is no way. Many /Btn have /Kids those /kids are other checkboxes that appear as "indirectObject". Also, normal checkboxes I can't select/modify in this pdf (examples bellow)

Code

from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
from collections import OrderedDict

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)
            })

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        # del writer._root_object["/AcroForm"]['NeedAppearances']
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

reader = PdfFileReader("TEMPORAL COMPLETO12 de mayo_unlocked.pdf")
writer = PdfFileWriter()

set_need_appearances_writer(writer)

page = reader.pages[0]

writer.addPage(page)

#Texto4 works, but not the checkboxes
writer.updatePageFormFieldValues(
    writer.getPage(0), {'BOTON_TIPOJORNADA': '/1',
                        'BOTON_JORN': '/S',
                        'Texto4': 'Texto4'
                        }
)
with open("filled-out.pdf", "wb") as output_stream:
    writer.write(output_stream)
reader.stream.close()

If I modified the pdf manually and read the fields...:

reader.getFields()

OUTPUT (one checkbox selected):
[...]

'BOTON_JORN': {'/FT': '/Btn',
  '/Kids': [IndirectObject(160, 0),
   IndirectObject(162, 0),
   IndirectObject(167, 0),
   IndirectObject(172, 0)],
  '/T': 'BOTON_JORN',
  '/Ff': 49152,
  '/V': '/S'},

OUTPUT (another checkbox selected):
[...]

'BOTON_JORN': {'/FT': '/Btn',
  '/Kids': [IndirectObject(160, 0),
   IndirectObject(162, 0),
   IndirectObject(167, 0),
   IndirectObject(172, 0)],
  '/T': 'BOTON_JORN',
  '/Ff': 49152,
  '/V': '/D'},

Another checkbox, with NO /kids but I can't select/modify is: 'TEXTOCasilla de verificación25' when selected has the value '/S#ED'

'TEXTOCasilla de verificación25': {'/FT': '/Btn',
  '/T': 'TEXTOCasilla de verificación25',
  '/V': '/S#ED'},

Any tip will be great




Aucun commentaire:

Enregistrer un commentaire