Git Based Backups, a Followup
March 31, 2010 on 1:08 pm | In Personal, Programming | By QBasicer | 2 CommentsI’ve been using git based website backups for a while now. In my previous post, I talked about how my backup system works. So far, it’s worked fairly flawlessly, but there have been lessons learned.
The first lesson I learned is that you CAN dump a MySQL database and keep it in git, however, you have to do it the right way. Turns out, by default, mysqldump puts everything on one line, including new data. This makes for extremely large database dumps, and makes it almost useless to tell what’s new and what’s old. By using “–complete-insert”, it splits each row into a separate SQL command. This makes the initial version quite a bit larger, but subsequent versions smaller. My database 41MB when the dump is complete, which is quite large.
The second lesson I’m now learning is that git stores the entire history locally inside the .git directory. This initially doesn’t take much space, as it’s compressed, but it grows. I have limited space on my host, and currently my .git directory is pushing 131MB. My entire blog directory is now 191MB, so really, the content is just 19MB. That’s a LOT of space (I blame it all on the database, the actual content doesn’t change much), although each individual diff is about 500K. I did some digging, and the kernel.org folk said:
In general, git is not a viable solution for the the case of a large repository with relatively small individual checkouts. However, if developers do not intend to clone, fetch, push into or push from their repositories, then use shallow clones
Well that certainly limits the usefulness now doesn’t it! Unless I’m missing something, it’s basically saying that if you use ’shallow’ copies, then you can’t really do anything with them unless you create patches locally and apply them in a full copy and commit. In a large team, I would say this makes git almost useless. Back to my situation, it means that it limits the usefulness of backups to almost zero.
So is git the best way to go? Well maybe. Perhaps backing up the database in the method I chose is the best way. I probably would have been better compressing it, naming it, and shipping it to another server. But hey, I had to find this out right?
Git Based Backup
March 7, 2010 on 7:47 pm | In Personal, Programming | By QBasicer | 1 CommentHey Everybody, sorry about the extended absence. I’ve been very busy of late! I’ve recently decided to try out the newest thing in SCM (Source Control Management), Git. Lots of people are getting on the git bandwagon, and I’m not sure if I’m totally on board yet. One of the things that appealed to me, however, is the ease of setting everything up. I’ve decided, after years, to have an actual backup solution for my website. I’ve been left without backups before, and it’s bitten me in the arse recently. As you can probably guess where this is going, yes, I’m doing git based backups of my website – database and all.
This lets me do some interesting things. First of all, I can have nightly backups, and ‘checkpoints’, points where I could easily roll back changes if I screw something up. Another is a site that only is a snapshot of the live site, whether for testing or redundancy (via branches). And lastly, it lets me have a log of recent changes that I make, and look at changes and have the ability to revert changes that I may have done while working on a script. I typically commit and push changes to a script with a detailed message when I make changes, outside of the nightly backup cycle.
At first, I was curious to see how MySQL dumps would do inside backups. So far they seem to do very well, and don’t mind being in SCM. It gets dumped to text, so stuff that doesn’t change… well doesn’t change. I back up data to one spot regularly, and then semi-frequently to another. This is how it’s laid out:

The live website is denoted with a 1, and the attached database is also pictured. This is where the general traffic and changes are, aka, the critical data. Every night, the data gets replicated to computer 2, which is completely in another country. Unfortunately it’s on a slow link, and not 100% reliable, but hey, at least it’s somewhere right? Computer #3 is my laptop, which is where I may or may not make changes to some files and testing, before committing them. On the server (computer #1), changes are pulled and merged into the repository automatically (if they’re non conflicting), before the fresh changes are committed. I’ve been running this nightly for about a half a week for my low traffic sites, and recently moved it to my blog. The scripts that run will let git automatically add new files, and delete deleted files, having a perfect snapshot of the blog. I’m going to continue testing this on more of my websites, before I make it available to the websites that I host, including a control panel type feature for managing snapshots.
It’s also worth noting that the connection between my server and the main git repository is secured by key based SSH authentication. I haven’t had any issues with this so far and has been an excellent solution.
I like to give credit, where credit is due, so I’m going to give mad props to my webhost, who processed my request for a shell within 15 minutes, installed git within 15 minutes, and fixed the git installation in 15 minutes. Their level of support has been truly remarkable. A big thanks to you guys! Check them out (A Small Orange). If you like them, and decide you want to sign up, don’t forget to use the referral link on my blog
.
Powered by WordPress with Pool theme design by Borja Fernandez. I rewrote the CSS because I'm cool like that.
Entries and comments feeds.
Valid XHTML and CSS. ^Top^
