Read Microsoft Word Document using Python code example

Read Microsoft Word Document using Python code example

Word Document

Application that process or stores text, images, chart, tables and so on. Python offers Word processing library by which we can manipulate its contents or other objects. Let’s follow the code to read one sample document in python which is referred as “sample.docx“.

Prerequisites

python-docx: this library needs to be imported to support the code that will help us to manipulate a word document.

Code example

import docx
doc = docx.Document('sample.docx')
#reading paragraphs by passing integer index
print(doc.paragraphs[0].text)
print(doc.paragraphs[1].text)

#using runs method to iterate various attribute

#get number of available runs in specific paragraph
print(len(doc.paragraphs[1].runs))
#get Text
print(doc.paragraphs[1].runs[0].text)
for run in doc.paragraphs[1].runs
	print(run.text)

#Reading complete paragraph

completeText = []

for para in doc.paragraphs;
	completeText.append(para.text)

print('\n'.join(completeText))

Style property

It is very useful property in word which determines look and feel of text within document and can be access as below:

Code example

import docx
#Reference existing document
doc = docx.Document('sample.docx')
#print first paragraph
print(doc.paragraphs[0].text)
#get first paragraph style
doc.paragraphs[0].style
#change style of the paragraph to Normal
doc.paragraphs[0].style = 'Normal'
#save document to new document
doc.save('myCustomStyle.docx')

Next >> Image Processing in Python

Leave a Reply

Your email address will not be published. Required fields are marked *