Python tool for converting files and office documents to Markdown.
Easy to use:
markitdown path-to-file.pdf > document.mdOr use -o to specify the output file:
markitdown path-to-file.pdf -o document.mdYou can also pipe content:
cat path-to-file.pdf | markitdownThe project can be found on this GitHub repo.
Let's install with these commands:
git clone https://github.com/microsoft/markitdown.git
Cloning into 'markitdown'...
remote: Enumerating objects: 2168, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 2168 (delta 0), reused 0 (delta 0), pack-reused 2162 (from 2)
Receiving objects: 100% (2168/2168), 4.15 MiB | 2.50 MiB/s, done.
Resolving deltas: 100% (1238/1238), done.
Updating files: 100% (161/161), done.
cd markitdown
python -m pip install -e "packages/markitdown[all]"
Obtaining file:///C:/Python313_64bit/markitdown/packages/markitdown
...
Successfully installed XlsxWriter-3.2.9 azure-ai-documentintelligence-1.0.2 azure-core-1.39.0 azure-identity-1.25.3
cobble-0.1.4 coloredlogs-15.0.1 humanfriendly-10.0 isodate-0.7.2 magika-0.6.3 mammoth-1.11.0 markdownify-1.2.2
markitdown-0.1.6b2 msal-1.36.0 msal-extensions-1.3.1 olefile-0.47 onnxruntime-1.20.1 pdfminer-six-20251230
pdfplumber-0.11.9 pypdfium2-5.7.0 python-pptx-1.0.2 speechrecognition-3.16.0 standard-aifc-3.13.0 standard-chunk-3.13.0
xlrd-2.0.2 youtube-transcript-api-1.0.3