Tales from a WordPress Migration - How to (not) export your media library?

Earlier in this week I got into a bit of a pickle with large WordPress Migration. Granted it was the third time that I'd had to migrate the site but that's another story!

The current website is cluttered with plugins that aren't serving any purpose – x36 to be precise. There are also some 6000 posts and 90 odd pages that needed migrating. With the age of the site and the number of plugins, as you can imagine the database is currently pretty large (260mb) so I didn't really see any benefit in a straight clone of the existing site. I wanted to start from a fresh so the new website would start with a clean database – we'd be clearing out of the a lot of the clutter.

Additionally for a bit of background the new site is using a new custom theme to remove the reliance on a page builder. In terms of plugins on the development site we've now reduced this to the following four; ACF, Classic Editor, Yoast and Events Calendar. On the development version we're using; WP All Import plus User Import Add-On, WP Migrate and WP Bulk Delete.

I'm not sure how bespoke the below approach is and therefore if this is likely to be of assistance to anyone but it's been nice to reflect on something, that which at the time was a pretty stressful. In my head technology shouldn't be this hard especially after some twenty years of experience but I guess it just goes to show we never know when we take on a job what challenges it is going to present and with experience and know how we can usually come up with a solution.

Anyway I digress – back to the migration which is what I want to concentrate on for this post. Before we get into it, it goes without saying when working with databases directly be careful what you run and ensure you take regular backups before you run anything that might break them.

WP All Import/Export & WP Bulk Delete #

WP All Import/Export is a plugin that I first purchased a couple of years back and it has been worth every penny, not just for this migration but for many others too. There is a bit of a learning curve in using the plugin, one that is certainly beyond the scope of this post. Basically a lot of trial and error when you first start. It certainly helps having a solid understanding of the structure of how WordPress stores its data to get the best out it. It's also very easy to make mistakes so it's always worth taking an export of your database before you run any large imports.

As I'd actually carried out the migration twice previously, at first I tried using the filters available within WP Export to only export posts modified after a certain date to prevent having to process 6k of posts on each export. However I struggled to get the numbers to sync once the import was completed. So reverted to using WP bulk delete to remove all posts (creating a SQL dump once deleted which then allowed me to reset the database should there be an error during the import).

On reflection I think my main issue was because WP Import was trying to match posts based on titles so if you have posts with a duplicate title this will cause an update and not a new post to be created. In the end I switched this out to match via slug but this does not ship natively with the plugin so I added the following to the custom functions available to the plugin;
https://gist.github.com/Garconis/8383b998c945b97529930d3c3918028a

Migrating WordPress Media Library #

The biggest challenge (mistake) of this particular migration was exporting the current media library. Now if I'd of thought of this earlier I should have moved this across first. But in this case I only remembered the media after finalising the migration of the posts and pages. In my wisdom I ventured down the rabbit hole of finding a way to move the existing media library (some 3GB of files) from the current site to the new development version where perhaps I would have been better moving the media and then re-importing the posts.

Now if you're not aware of how media generally works in WordPress you've got an uploads folder split by year and month, within which are the various uploaded files plus its variation of sizes. For these to be usable/visible within the Media Library each must then have an entry within the posts table in the database plus the following post meta stored within the postmeta table;

  • _wp_attached_file - it's path within uploads
  • _wp_attachment_metadata - additional data such as dimensions

The problem being that my posts table is now full of posts and pages with their own IDs so I can't just lift the existing attachment posts from the current site and drop them into the database not without creating a bunch of new ID for each post and of course its related meta data.

My first thought was to create some custom code to create a loop to create the updated IDs but this comes with the complication of including the postmeta plus I'd then need to write something the other side to then import the data. At this point I was getting frustrated it was 3pm and I'd been working the migration all day and I'd expected to have it sorted by now. I needed something that was going to be quick and fairly easy to action. What I ended up with was custom SQL query which I could run in TablePlus to export the attachment posts as a CSV. I could then take the CSV and import this using TablePlus. Here's how I did it.

Consideration🤔... Once I'd written this post I noticed a couple of potential issues. The first being when an attachment has a parent post. The second being you'll need to make sure you've migrated your users and that their IDs are going to match those of the imported posts.

The Export Query (Export all attachment posts from WordPress database with meta data) #

  • Select everything (*) from the posts table where the post_type is equal to attachment
  • Then we do a find/replace REPLACE('guid',"YOURDOMAIN", 'YOURLOCALDOMAIN' ) AS new_guid to create a new_guid column within our export which will point to our local domain.
  • We then want a way to create a new_id column that auto increments ROW_NUMBER() OVER () + 1000 AS new_id. In our case by giving it the current highest id within our local posts table in the example below this would be 1000 so our first new attachment post id is going to be 1001. Granted I had to google this one and I think it's only available since MySql 8?
    https://www.mysqltutorial.org/mysql-window-functions/mysql-row_number-function/
  • We then need x2 sub queries to create a column for both bits of metadata _wp_attached_file and _wp_attachment_metadata.
  • Before each sub query we add in a column for the to act as the meta key for when we import to the postmeta table.
SELECT
*,
REPLACE(`guid`, "LIVEDOMAIN", 'LOCALDOMAIN') AS new_guid,
ROW_NUMBER() OVER () + 1000 AS new_id,
"_wp_attached_file" AS `attachment_key`,
(
SELECT
meta_value
FROM
wp_postmeta
WHERE
`meta_key` = "_wp_attached_file"
AND `post_id` = `wp_posts`.ID) AS attachment_file, "_wp_attachment_metadata" AS `metadata_key`, (
SELECT
meta_value
FROM
wp_postmeta
WHERE
`meta_key` = "_wp_attachment_metadata"
AND `post_id` = `wp_posts`.ID) AS attachment_meta
FROM
`wp_posts`
WHERE
`post_type` = "attachment"

The results can then exported from TablePlus as a CSV
Export button at the bottom of the screen in TablePlus. The resultibg export should look like something as per the below.

CSV export WordPress attachment with metadata

Your exported CSV should have same number of rows as stated at the bottom of your Media Library within Wordpress;
CSV export WordPress attachment with metadata

The Import #

Firstly you want to make sure that you have downloaded all files from the relevant directories from your live site usually wp-content\uploads down to the local development site. You also want to ensure that the Media library is empty in terms of data in the database this can be achieved via the following SQL statements.

DELETE FROM `wp_posts` WHERE `post_type` = "attachment";
DELETE FROM `wp_postmeta` WHERE `meta_key` = "_wp_attached_file";
DELETE FROM `wp_postmeta` WHERE `meta_key` = "_wp_attachment_metadata";

Once we have the CSV exported we can then import the attachments to our development WordPress install. This is hopefully 🤞 made easy with TablePlus.

  • Right click the table you want to import to so for example wp_posts and click Import > From CSV...
  • You can then browse to where you saved the exported CSV
  • Once selected TablePlus will show you its Import CSV Wizard which automatically maps the CSV column names to that of the database table. Though we want to import them into the development database with our new_id and new_guid columns so we need to amend both ID and guid columns to Do not Import. Then set the new_id column to map to ID and new_guid column to map to guid.
Do Not Import columns New ID & Guid Column Maps

We then need replicate the process twice for the postmeta table for each meta_key so;

  • For _wp_attached_file the CSV import map will be
    post_id > new_id meta_key > attachment_key meta_value > attachment_file
    Attachment File Mapping
  • For _wp_attachment_metadata the CSV import map will be
    post_id > new_id meta_key > metadata_key meta_value > attachment_meta
    Attachment Meta Mapping

TablePlus ❤️ #

As mentioned above I kept things simple and used the tools I had at my disposal. TablePlus is fantastic when working with databases and for jobs like this it really saves time! ps. I'm not sure if you do similar with PhpMyAdmin but I can't see why you couldn't?

Once imported your media library should be migrated across and accessible within your development version.

All that was left to do was to re-run the WP Import process and update the post content making sure you select the following image options options which will allow WP Import to search within the newly imported media library for any images with the content of posts/page, as far as I can tell there is no need to update URLs within your export it does a good job of handling these whilst importing. Just make sure you don't select download images hosted elsewhere which would create duplicates and is also a much slower process as for each image found it will attempt to download form the web and add to your library.

WP Migrate Image Import settings

One final snag in my case was that some of the posts did not have a featured image in place however it was set as the first image within the content. Granted as with anything with WordPress there is certainly plugins that would do this job for me but I opted to write something fairly simple so that I was in full control of the process. I'd got this far and I didn't want a plugin to take over and mess things up, plus there was only a handful where this was the case. Here's what I wrote – though it's not perfect and will most likely need some tweaks depending on your use case but hopefully it's a good starting point.

Firstly we use WP_Query to select all posts where they don't have a featured image (_thumnail_id does not exists within the post meta table). We then loop over each of the posts, if it has content then we can then search this using regex for any images and get the first.

We then need to replace the live domain within the image URL with our local domain.

We then need to remove the dimensions from the image e.g my-image-500x500.jpg will become my-image.jpg. This is in order for the function attachment_url_to_postid() (https://developer.wordpress.org/reference/functions/attachment_url_to_postid) to work – it needs the original filename.

Though I did note some issues with images are that are "scaled" so it may need tweaking to cover all bases. There is a ticket on trac here but I struggle to track down if this should work as it's fixed or if you need to roll our your own solution? Is it that the patch needs merging to core 🤷‍♂️

If an attachment id is found we then use set_post_thumbnail($post->ID, $attachment_id); to set the featured image for the post. We then push data to the $updates array which is there so you view/filter the results as required.


$args = array(
'posts_per_page' => -1,
'post_type' => 'post',
//'post_status' => array('publish', 'pending', 'draft', 'auto-draft', 'future', 'private', 'inherit', 'trash'),
'post_status' => array('publish'),
'meta_query' => array(
array(
'key' => '_thumbnail_id',
'value' => '?',
'compare' => 'NOT EXISTS'
)
),
);


$the_query = new WP_Query( $args );

$updates = [];


foreach($the_query->posts as $post) {

$attachment_id = null;
$first_image = null;
$status = null;

if($post->post_content!="") {
$output = preg_match_all('/<img.+?src=[\'"]([^\'"]+)[\'"].*?>/i', $post->post_content, $matches);
$first_img = $matches[1][0];
$local_path = str_replace("LIVEDOMAIN","LOCALDOMAIN",$first_img);
$local_path = preg_replace("/-([0-9]+)x([0-9]+)/","",$local_path);

$attachment_id = attachment_url_to_postid($local_path);

if($attachment_id) {
set_post_thumbnail($post->ID, $attachment_id);
}

};

array_push($updates,
[
'post_id' => $post->ID,
'post_slug' => $post->post_name,
'content' => $post->post_content!="",
'image' => $first_img,
'status' => $status,
'local_path' => $local_path,
'attachment_id' => $attachment_id
]);

};


$no_image = array_filter($updates, fn($row) => $row['image']);

Once the above was finalised. I then used WP Migrate Pro to do a full find a replace for live domains to local domains.

Tinkerwell #

Quick mention for Tinkerwell. I wrote and ran the above with using Tinkerwell. This is another tool that is extremely handy to have. Previously I would have written the code within the themes function file but I didn't realise that Tinkerwell works with WordPress out of the box!

Summary #

At the start of the article I pointed out that the existing database was some 250mb this is now reduced to now 59mb so I feel it was worth the effort to spend the time being selective as to what we wanted to migrate albeit I could of handled things is a better order.

Now that everything is migrated across it would also probably worth our while in running Media Cleaner plugin. This will delete unused media from the library - although there is a debate to say if it isn't being used then maybe someone may require to use it a later date.

Got a question hit me up on Twitter or Mastodon 👇 Got a tricky WordPress migration - let's talk!