lundi 2 octobre 2017

How can I insert a checkbox form into a .docx file using python-docx?

I've been using python to implement a custom parser and use that parsed data to format a word document to be distributed internally. All of the formatting has been straightforward and easy so far but I'm completely stumped on how to insert a checkbox into individual table cells.

I've tried using the python object functions within python-docx (using get_or_add_tcPr(), etc.) which causes MS Word to throw the following error when I try to open the file, "The file xxxx cannot be opened because there are problems with the contents Details: The file is corrupt and cannot be opened".

After struggling with this for a while I moved to a second approach involving manipulating the word/document.xml file for the output doc. I've retrieved what I believe to be the correct xml for a checkbox saved as replacementXML and have inserted filler text into the cells to act as a tag that can be searched and replaced, searchXML. The following seems to run using python in a linux (Fedora 25) environment but the word document displays the same errors when I try to open the document, however this time the document is recoverable and reverts back to the filler text. I've been able to get this to work with a manually made document and using an empty table cell, so I believe that this should be possible. NOTE: I've included the whole xml element for the table cell in the searchXML variable, but I've tried using regular expressions and shortening the string. Not just using an exact match as I know this could differ cell to cell.

searchXML = r'<w:tc><w:tcPr><w:tcW w:type="dxa" w:w="4320"/><w:gridSpan w:val="2"/></w:tcPr><w:p><w:pPr><w:jc w:val="right"/></w:pPr><w:r><w:rPr><w:sz w:val="16"/></w:rPr><w:t>IN_CHECKB</w:t></w:r></w:p></w:tc>'

def addCheckboxes(): 
    os.system("mkdir unzipped")
    os.system("unzip tempdoc.docx -d unzipped/")

    with open('unzipped/word/document.xml', encoding="ISO-8859-1") as file:
        filedata = file.read()

    rep_count = 0
    while re.search(searchXML, filedata):
        filedata = replaceXML(filedata, rep_count)
        rep_count += 1

    with open('unzipped/word/document.xml', 'w') as file:
        file.write(filedata)

    os.system("zip -r ../buildcfg/tempdoc.docx unzipped/*")
    os.system("rm -rf unzipped")

def replaceXML(filedata, rep_count):
    replacementXML = r'<w:tc><w:tcPr><w:tcW w:w="4320" w:type="dxa"/><w:gridSpan w:val="2"/></w:tcPr><w:p w:rsidR="00D2569D" w:rsidRDefault="00FD6FDF"><w:pPr><w:jc w:val="right"/></w:pPr><w:r><w:rPr><w:sz w:val="16"/>
                       </w:rPr><w:fldChar w:fldCharType="begin"><w:ffData><w:name w:val="Check1"/><w:enabled/><w:calcOnExit w:val="0"/><w:checkBox><w:sizeAuto/><w:default w:val="0"/></w:checkBox></w:ffData></w:fldChar>
                       </w:r><w:bookmarkStart w:id="' + rep_count + '" w:name="Check' + rep_count + '"/><w:r><w:rPr><w:sz w:val="16"/></w:rPr><w:instrText xml:space="preserve"> FORMCHECKBOX </w:instrText></w:r><w:r>
                       <w:rPr><w:sz w:val="16"/></w:rPr></w:r><w:r><w:rPr><w:sz w:val="16"/></w:rPr><w:fldChar w:fldCharType="end"/></w:r><w:bookmarkEnd w:id="' + rep_count + '"/></w:p></w:tc>'
    filedata = re.sub(searchXML, replacementXML, filedata, 1)

    rerturn filedata

I have a strong feeling that there is a much simpler (and correct!) way of doing this through the python-docx library but for some reason I can't seem to get it right.

Is there a way to easily insert checkbox fields into a table cell in an MS Word doc? And if yes, how would I do that? If no, is there a better approach than manipulating the .xml file?




Aucun commentaire:

Enregistrer un commentaire