How to convert SharePoint pages into PDF files

This post details how to convert SharePoint pages into PDF using Power Automate

June 15, 2022

Libraries, PDF, Power Automate, SharePoint, Site Pages

In this post we step through how you can use Power Automate to convert modern SharePoint pages into PDF files and save them to a document library.

Intro

Recently I got asked to come up with a way to turn SharePoint pages into PDF files for use in an offline scenario. The converted SharePoint pages didn’t need to be formatted as it was only the body content of a SharePoint page that was needed. Also part of the brief was that when the SharePoint page is updated, the corresponding PDF file also updates.

There are several posts online that cover very topic this that I’ll reference at the end, but they didn’t quite do exactly what I wanted – so here’s my take on how to convert SharePoint pages into PDF files!

What you’ll need

A modern SharePoint site pages library (these come with every SharePoint site!)
A OneDrive location to temporarily store the SharePoint page outputs
Power Automate to build the automation
A document library to store the output PDF files

A note on the site pages library

In my example I didn’t want all the site pages to be converted into PDF files, so I added a choice column to ‘tag’ all the pages that should be converted. I set the default value of the choice column to be ‘Site Page’, so that the only pages that get converted are the ones I’m interested in. This is reflected in the flow below with the condition step.

*Add a choice column to ‘tag’ the pages you wish to convert to PDF.*

Building the flow

The trigger action for our flow is when a file is created or modified (properties only). This allows us to re-run the flow when SharePoint pages are updated to also update the PDF files.

Select the site you are using to create the SharePoint pages in site address (If you don’t see it listed just press enter custom value and paste the URL in)
Select the Site Pages library under library name

Next, I’ve added a condition to only convert pages that have been tagged ‘Runbook’ to PDF.

*Condition: if Document type value is equal to ‘Runbook’.*

Note: make sure you select the Value dynamic content for your choice column, rather than the choice column itself as that will break your flow.

If yes, next is a send an HTTP request to SharePoint step. Here I’m using a REST API call to get the body content of the SharePoint page.

*Use a send an HTTP request to SharePoint step to get the body content of your page.*

Set the site address to the site in question
Set method to GET
Enter the following in Uri:_api/web/lists/GetByTitle('Site%20Pages')/items('ID')/CanvasContent1
Replace ‘ID’ with the dynamic content ID from the when a file is created or modified step

Note: The output of this step generates some additional stuff you probably won’t want in your PDF like this:

 "d": { "CanvasContent1": "}}

I used the parse JSON step to remove the unwanted mark up and just get the plain text from the body content.

I added the body dynamic content from the send an HTTP request to SharePoint step in the content field in the parse JSON step
I copied the the output body from send an HTTP request to SharePoint of a successful run in flow history and pasted it into the parse JSON step

*Output body from send an HTTP request to SharePoint to paste into the parse JSON step from a successful flow run.*

I then pressed generate from sample, which output the following:

{
    "type": "object",
    "properties": {
        "d": {
            "type": "object",
            "properties": {
                "CanvasContent1": {
                    "type": "string"
                }
            }
        }
    }
}

*Parse JSON step with generated schema.*

From this I then used a create file action to create a temporary HTML file in OneDrive (more on this later), with the following config:

Folder path: / (root of the OneDrive account)
File name: Name from when a file is created or modified step
File content: CanvasContent1 from the parse JSON step

*Create file action to create temporary HTML page in OneDrive.*

Next, a convert file step to convert the HTML page into a PDF file:

File: ID from the create file step
Target type: PDF

Now we can use a create file action to create a PDF in our output document library in SharePoint:

Set the site address to the site you want to store the PDF files in
Set the folder path to the document library, or navigate to the relevant folder within that library
Set file name to file name from the convert file step
Set file content to file content from the convert file step

*The create file action creates the PDF file in the destination document library.*

I then used an update file properties action to pass metadata from the site pages library to the destination document library – this step is optional. Finally, a delete file action to delete the temporary HTML file from the OneDrive we created earlier:

*Delete file action to remove the temporary HTML file.*

Here’s the flow in it’s entirety:

Issues & troubleshooting

Formatting issues with the send an HTTP request to SharePoint

As mentioned above, when just using the send an HTTP request to SharePoint action, the output contains mark up that isn’t going to make sense within the PDF. The parse JSON action cleans this up and just leaves the body content of the page.

Create file action creates corrupt PDF files

When testing this flow out I originally didn’t have the convert file action in place. In the file name I added ‘.PDF’, but every time the output PDF was corrupt and errored like this when trying to open:

The flow also failed on this step and the error said that “Conversion of this file to PDF is not supported. (InputFormatNotSupported / pdf)”. I decided to scrap this approach and create a HTML page and add in the convert file action which worked around this issue.

Overwriting existing PDF files causes flow to fail

During testing of this flow I also noticed that when triggering the flow based off updating a site page, the create file create file action would error with a status 400 error saying “A file with the name [file name] already exists”.

I’ve wrote a separate post on how to overwrite files using the create file action, but basically the answer was to turn off chunking within the actions settings.

References

14 responses to “How to convert SharePoint pages into PDF files”

A Stark

October 24, 2023 at 12:07 am

Great article! Worked really well but for me the images are not included in the PDF. I noticed in the HTML files images are not included either because the path to the images does not include in the URL for my SharePoint site.

data-imageurl=\”/sites/GRPTestA/SiteAssets/SitePages/Hardware-Test-Page/1265678308-Circle-icons-computer.png

I created a variable as a string value is my SharePoint site URL. Any ideas on how I can insert this as a prefix for data-imageurl maybe in the Parse JSON?

I apologize in advance, this is all new to me.
Appreciate the help 🙂

Loading…

Reply
1. Anthony
  
  October 25, 2023 at 5:03 am
  
  Hi, thanks for the comment. I haven’t done this myself yet but I think you would need to store it as a variable, convert to base64 and insert into the html. I think the issue is placing the image in the right place as in my example we are not constructing the HTML
  
  Loading…
  
  Reply
Tom

August 7, 2023 at 7:13 am

i created a Windows Autohotkey (https://www.autohotkey.com/) script that runs off the sharepoint site pages list opened in chrome browser with send to pdf as the default printer in my chome print settings.

The script is

Sleep, 10000
Loop, 100 {
Send, {ENTER}
Sleep, 5000
Send, {CTRLDOWN}p{CTRLUP}
Sleep, 2000
Send, {ENTER}
Sleep, 2000
Send, {TAB 3}{ENTER}
Sleep, 2000
Send, {CTRLDOWN}w{CTRLUP}
Sleep, 1000
Send, {DOWN}
}

Save this as an .ahk file using Notepad.

Open the site contents page in chrome and check send to pdf is set as the print setting:

…..sharepoint.com/sites/YOURSITE/SitePages/Forms/RecentChanges.aspx

Play the script using autohotkey(not going to explain that, do some googling research), click on an empty spot next to the first item in the list open in the chrome browser (within 10s) and let the script run, it will loop through up to 100 pages. It is sending standard windows commands that can be interpreted with some common sense upon a bit of study.

Note you may have to modify the script to match any differences in the order of your chrome browser print settings. Again i’m not going to help on customising this, it’s just to give ideas on whats possible.

It does work well for an entire site for me.

Loading…

Reply
Jesper Simonsen

August 3, 2023 at 11:35 am

Hi, no the pictures does not show, probably because they are stored in the SiteAssets library.
But actually the Print to PDF can produce a close to screen view PDF, you just have to click More settings in the Print dialog, then select Customized under Scale and set the Scale somewhere around 40 to 50, ajust using the preview. This way your PDF contains the pictures and in the right place.

Loading…

Reply
1. Jesper Simonsen
  
  August 7, 2023 at 9:22 am
  
  And if you choose Landscape as Layout it fits better the wide screen layout and you can scale higher up , keeping the layout and making the text readable if you are actually printing the PDF
  
  Loading…
  
  Reply
Angelo

June 20, 2023 at 3:14 pm

Hello, thanks for your content.
I was able to generate the file, but when I try to open it I got error. Tried to opned in Chrome, Edge and Acrobat Reader. It seems that the file is damaged. Have any suggesting to fix it? Thanks

Loading…

Reply
Gaurab

April 29, 2023 at 9:09 am

After the conversion the photos are not visible

Loading…

Reply
1. Anthony
  
  April 29, 2023 at 8:10 am
  
  That’s strange, they were for me
  
  Loading…
  
  Reply
2. Mille
  
  June 16, 2023 at 12:02 pm
  
  I also could not see photos.
  
  Loading…
  
  Reply
3. mike
  
  August 11, 2023 at 7:01 pm
  
  I also cannot get images to show up on the pdf. In the html file, they are standard image icons and in the pdf the are small box icons with red x’s. Would love to hear if anyone figures that out.
  
  Even more helpful would be if the links in the Quick Links webpart would actually show up as links rather than just the link title text. Let me know if anyone has any ideas!
  
  Loading…
  
  Reply
  1. Anthony
    
    August 12, 2023 at 6:42 am
    
    Hi Mike, I think part of the issue would be putting the images back in the correct place. I’m not sure how you’d do this without manipulating the html for the page, which in turn might break it!
    
    Loading…
Anthony

July 11, 2022 at 10:23 pm

Hi, thanks for the comment. You could add another step to the flow to create a variable in which you format some HTML to include the page title and then the output JSON as the main content, then convert that into a HTML document then PDF

Loading…

Reply
Jess D.

July 11, 2022 at 9:51 pm

Is there a way to get the site page title to appear on the PDF?

Loading…

Reply
1. mike
  
  August 11, 2023 at 7:55 pm
  
  When you do create file, you can add your title field before the CanvasContent1 field
  
  Loading…
  
  Reply