Recovering partial data from a heroku postgresql database
It's common for customers to try our software then once they have decided they are committed and are going to use us, they typically want to delete all the "test" data. This customer had 2 businesses and 2 different accounts. He wanted to delete all tickets prior to 5/10/24, except for in one account, he wanted to keep all data for a particular Job. When asked to do some destructive or modifying behavior in production, I will first test it on a dev copy, make notes, save scripts and then only when everything checks out, will I run it in production. I did all this and started to repeat my steps in production. Here's where I made the dumb mistake. I started off in the wrong customer account. I deleted tickets from the one job he wanted to keep. My heat sunk and I got a uneasy feeling in my stomach.
Of course we have backups, but how would I restore or rollback only this customers' data? It was the middle of the business day, so we had tons of other customers modifying data. Plus, like most applications, deleting a record will likely have cascading effects. In our application, deleting a ticket, deletes ticket images, transactions, and statements.
After a lot of research and help from Heroku Enterprise support, I was able to restore the customers' data. It wasn't a simple process, but it's doable and I learned a lot about restoring data and Heroku Postgres.
Step 1: Heroku Rollback
Most heroku postgresql plans standard and above have the rollback feature. When I had heard of this before, I thought it meant rolling your main production database back, but that's not how it works. You create a copy of your production database and roll it back to a certain timestamp. So if you make a mistake, make sure you know what time you made it. To create the rollback database:
heroku addons:create heroku-postgresql:standard-2 --rollback HEROKU_POSTGRESQL_GRAY --to '2024-05-13 15:10-04:00' --app yourapp
This will take a little bit depending on the size of your db. Once it's finished, you can access the db like this:
Find the db name:
heroku pg:info -a yourapp
You should get your production db and also the new rollback db:
=== HEROKU_POSTGRESQL_AMBER_URL
Plan: Standard 2
Status: Available
....
Fork/Follow: Available
Rollback: earliest from 2024-05-13 20:56
Created: 2024-05-13 20:04
Forked From: HEROKU_POSTGRESQL_GRAY
Add-on: postgresql-fluffy-41625
Then you can access the new db by the add on name or the color name:
heroku psql HEROKU_POSTGRESQL_AMBER -a yourapp
Ok, this is great, now we have an exact copy of the db, right before I made a mess of things. But what I can do with it?
First, maybe we want to pull a copy to dev:
heroku pg:backups:capture HEROKU_POSTGRESQL_AMBER
heroku pg:backups # get backup ID of amber
heroku pg:backups:download [BACKUP_ID]
mv latest.dump before_disaster.dump
# you can now import this locally
createdb before_disaster
pg_restore --verbose --clean --no-acl --no-owner -h localhost -d before_disaster before_disaster.dump
Note: be very detailed about what data you download. I also downloaded a production copy so I could test and compare:
heroku pg:backups:capture DATABASE
heroku pg:backups # get backup ID of DATABASE
heroku pg:backups:download [BACKUP_ID]
mv latest.dump after_disaster.dump
# you can now import this locally
createdb after_disaster
pg_restore --verbose --clean --no-acl --no-owner -h localhost -d after_disaster after_disaster.dump
Step 2: Exporting the data
Now comes the part I had never done before. How do you export only selected records out of a db and then import them back into another db. It turns out it's not that difficult. Hat tip to this article. I'd read about postgres COPY before, but this article did a really good job of displaying it. So much so, that at first, I didn't think it would work. To keep this article simple, I'm not going to go into detail of all the records I had to export/import, but will just focus on 1 table. Log into your new rollback database:
heroku psql HEROKU_POSTGRESQL_AMBER -a yourapp
Then use the \COPY command to export a select set of data:
yourapp::HEROKU_POSTGRESQL_AMBER=>\copy (SELECT * FROM job_tickets where job_id = 10201) TO '~/Desktop/job_tickets.csv' WITH CSV DELIMITER ';' HEADER
Now if you're used to working with heroku and `heroku run rails c` you will know that you are on the heroku dyno and any calls to save a file will not be on your computer, but on the dyno. But postgresql is different. You are using a psql client, connecting remotely to your db from your system. This command actually saves to your local computer. It's a CSV file that you'll later see how to import back in.
Step 3: Importing the data
I did this on the database on my computer first.
psql after_diaster
after_diaster=#\copy job_tickets FROM '~/Desktop/job_tickets.csv' DELIMITER ';' CSV HEADER;
That's it! The data will be imported very quickly and efficiently. I was amazed!
Update your database.yml to connect your dev database to after_disaster. Check everything out. If everything looks good, you can now do this in production:
heroku psql DATABASE -a yourapp
yourapp::DATABASE=>\copy job_tickets FROM '~/Desktop/job_tickets.csv' DELIMITER ';' CSV HEADER;
Extra Credit:
You might imagine it wasn't as straightforward as this, but this does capture the strategy.
One thing I needed to do was import only the records that I deleted, but I didn't know which records I actually deleted without pulling up a version of before and after and comparing Id's and I didn't want to do that.
There's a pretty simple strategy I used (thanks to gpt). I first imported my records into a staging table, then did a subquery to determine if the record existed, if it did, I skipped it. This allowed me to import only the records I deleted. Here's the steps.
First create the staging table by copying the existing table's schema:
CREATE TABLE staging_job_tickets (LIKE job_tickets INCLUDING ALL);
Now import your exported CSV into this staging table (instead of directly into the main table like in step 3):
\copy staging_job_tickets FROM '~/Desktop/job_tickets.csv' DELIMITER ';' CSV HEADER;
Now copy from the staging table into the main table if the record doesn't already exist in the main table:
INSERT INTO job_tickets
SELECT * FROM staging_job_tickets staging
WHERE NOT EXISTS (
SELECT 1
FROM job_tickets target
WHERE target.id = staging.id
);
Now you were able to successfully restore records that were deleted.