In this post we will look at how the image tags managed metadata column creates issues for ShareGate migrations and how to resolve them.
Intro
Image tags were introduced in July 2021 and are automatically added by Microsoft AI to improve search. When images get uploaded into document libraries, SharePoint will analyse them upon upload and tag them based on a set of 37 basic tags. SharePoint will also re-analyse each time an image is updated and update the image tags.
The image tags column is a managed metadata column that isn’t immediately available within a document library. you have to upload an image first, before the image tag column appears – which can take several hours.
The problem
When migrating files at scale using migration tools such as ShareGate desktop, you will start to see errors in your migration reports related to the image tags column. In the example below, the details of each failed item showed the same or similar errors:
“Property Image Tags: The following values are unavailable: ‘outdoor, sign, text’. Please specify another value.”
ShareGate migration errors
Example migration report showing the image tag error messages.
The solution
The way I resolved this issue was to change an option within the migration task itself to ignore the image tag column managed metadata values. To do this I did the following in ShareGate:
In ShareGate desktop > start a copy content migration
Select your source and destination sites
In the copy content screen > select Options
Select Metadata > then expand Document
You will see the image tags column here > change set mapped value to ignore
Tick apply for all version history
Set your copy content migration task to ignore the image tasks column metadata.
I’ve been suitably inspired by Andrew Warland’s fantastic two-part series documenting his approach and migration to SharePoint Online, so much so that I thought it would be a fun series to write about my own experiences.
It is’nt my intention to necessarily document Microsoft best practice in this series, rather just to explore some of the challenges, sucesses and experiences I notice along the way.
The current situation
My organisation has recently made the decision to move to to the cloud, with O365 being the naturally preferred destination. SharePoint has been well embedded, and heavily used within the business for several years, with on-premises SharePoint 2010 currently in production.
Finally, in terms of the SharePoint architecture and data volume, there are only three web applications to merge together as part of the migration effort. However, there are several site collections within our main intranet web app, plus many sub-sites nested within them, meaning the huge database sizes behind these site collections could prove difficult come migration time.
A note on the new, flat structure
Our current environment has a well established top-down structure in place that is generally consitent across the environment.
Having already made the investment in ShareGate, this will be the tool of choice for the migration. In the version 11.0 release of ShareGate, a new restructure option now allows you to promote sub-sites to top-level sites post inital migration from the source SharePoint environment.
The new restructure option in ShareGate 11.0
Considerations for a successful migration plan
One of the biggest issues to be resolved before we can start any sort of migration activity, is the fact that we have several content databases well over the 200GB recommended general use size limit.
Microsoft best practice suggests that any environment that has site collections, sites, content databases, libraries or lists that exceed the software boundaries and limits should be remediated prior to any migration activity. In this case, the main idea is to split each content database that exceeds 200GB into seperate content db’s, and where neccessary, move or promote sub-sites to site collections and attach new db’s.
Armed with the knowledge of the recent restrcuture functionality coming to ShareGate, plus my own personal feeling that any remediation activities to our current environment may in of itself carry adverse risk to the estate we proposed a different approach.
With all the reporting capabilities at our disposal via ShareGate, I was able to get a firm grasp of what resides within each site collection in our environment, in terms of:
The size of each sub-site underneath the top-level
Number/ size of libraries and lists
Number of items in each of the above
Any workflows running in any of the above
From this I ran a trial migration of a sub-site from SharePoint 2010 to a newly created team site in SharePoint Online.
Pre-migration
Before I kicked off the migration, I ran the source analysis tool within the Migration > Plan section of ShareGate. I noted the following obersavations:
The source analysis within “migration” in the ShareGate tool, although listed as only being able to analyze up to SharePoint 2013, does in-fact work for 2010
ShareGate source analysis
The source analysis cannot run at the sub-site level, meaning that you need to run it at the site collection level then just filter down to the sub-site in question through the report itself
Source analysis gives you a report of all checked-out files within a source site.From this, I created a simple view within each of the libraries that contained checked-out files to send to the site owners for action
Post-trial migration
The trial migration completed successfully as expected, however there were several interesting results I noted:
1. Everyone receieves a welcome email
If you migrate the permissions, once the source permission groups migrate each user will recieve a welcome email to the new SharePoint Online site.
Publishing sites seem to be the trickiest to migrate, especially those with custom master pages or page layouts. When migrating publishing sites, the Pages library is migrated wholesail, meaning the content won’t reside in the SitePages library (where new client-side pages are located).
3. Un-editable modern homepage
After the migration had completed, the new team site homepage threw up an error every time you tried to edit it.
I tried some of the documented resolution steps found here, but none of them worked for me. My solution was to just create a new page to replace the broken homepage, add all the relevant webparts and make this one the new default homepage.
Transforming classic publishing site pages to client-side pages
Publishing site pages will all be migrated as classic SharePoint pages, without the modern look and feel of a client-side page. My understanding is that for publishing pages with custom page layouts, additional metadata or custom content types will need to be transformed via PowerShell and creating a custom mapping file.
(I’m planning on writing a seperate blog post walking through an advanced publishing page transformation in the near future)
Its also worth considering that in the release notes for ShareGate 11.0 it makes mention of the fact they are researching the ability to transform classic to modern pages, so that could well simplify this process in a future release.
Conclusion
Overall, I was happy with our trial migration and believe it is a viable approach for us to move from on-prem to O365. Some lessons learned for myself would be to consider and SharePoint permissions audit prior to migration to remove any unecessary permissions, send an inventory out to site owners aswell as checked-out files, all in the name of reducing the migration effort.
This will be an ongoing series of posts, which i’ll focus more the on the nitty-gritty of the migration effort than anything else, but as always if there is any feedback or suggestions on how to improve this site, please let me know!
Data retention and deletion…I’m sure this is a something that anyone involved in Office 365, SharePoint on information management in general gets fed up of saying since the recent GDPR legislation!
Recently we have been rationalising and cleaning up our data in preparation for moving to Office 365. We are starting with SharePoint as the first target repository or silo of content.
The general consensus is to delete files and folders over 7 years old unless there is a pre-existing data retention policy to adhere to. So the next task is to identify those files that fall within our threshold, and ultimately delete.
Luckily, we have Tree Size Pro and ShareGate so I was able to relatively easily identify the files in question (there were a lot!).
The setup
As our SharePoint environment is a) rather full; and b) rather old, I made the decision to incrementally delete files rather than en-masse to mitigate risk, targeting the lists/libraries containing the most out of date content. I started by creating a view in the first library – library A with the following parameters:
Standard library view
Filtered by Created Date if less than or equal to 01/01/2011
Folders or Flat: Show items inside folders Show this view: In all folders
(all other settings are left default)
Results this returned looked good, I could see folders and files in this view that matched the criteria – brilliant! Based on my previous statement I decided to delete in batches out of working hours, again to mitigate risk. I deleted first from library A, then from the first stage and finally from the second stage recycle bin all in this fashion.
The problem
I had permanently deleted around 50% of the total volume of content to be deleted from library A when we started to receive reports of current files being ‘missing’ from library A…not a good day.
After these reports were investigated they were indeed true. It turns out that when folders are included within a library view, folders that match the filter will be shown in the view, regardless of whether the files inside match.
We tested the view exluding folders and all the files returned matched the filter criteria. The same results were demonstrated from a SharGate report of the same nature. The report of all files over 7 years old brought back folders over 7 years old, but they also contained files that were newer.
Conclusion
At present, we are not entirely sure as to why these filters are not able to drill down past a top-level folder. It appears to be difficult to specify via view settings to only show files within folders, including the folder itself that matches the criteria.
We have decided to omitt folders from our reports and views going forward and to solely focus on files as this is the most reliable way we can delete files.
Bonus: for those of you with ShareGate, heres an example of my report we created to bring back all files over 7 years old, excluding folders. I ran this report across the entire intranet application over a weekend and it worked a treat 🙂