Word Document
Application that process or stores text, images, chart, tables and so on. Python offers Word processing library by which we can manipulate its contents or other objects. Let’s follow the code to read one sample document in python which is referred as “sample.docx“.
Prerequisites
python-docx: this library needs to be imported to support the code that will help us to manipulate a word document.
Code example
import docx doc = docx.Document('sample.docx') #reading paragraphs by passing integer index print(doc.paragraphs[0].text) print(doc.paragraphs[1].text) #using runs method to iterate various attribute #get number of available runs in specific paragraph print(len(doc.paragraphs[1].runs)) #get Text print(doc.paragraphs[1].runs[0].text) for run in doc.paragraphs[1].runs print(run.text) #Reading complete paragraph completeText = [] for para in doc.paragraphs; completeText.append(para.text) print('\n'.join(completeText))
Style property
It is very useful property in word which determines look and feel of text within document and can be access as below:
Code example
import docx #Reference existing document doc = docx.Document('sample.docx') #print first paragraph print(doc.paragraphs[0].text) #get first paragraph style doc.paragraphs[0].style #change style of the paragraph to Normal doc.paragraphs[0].style = 'Normal' #save document to new document doc.save('myCustomStyle.docx')
Next >> Image Processing in Python