Git Based Backups, a Followup
March 31, 2010 on 1:08 pm | In Personal, Programming | By QBasicer | 2 CommentsI’ve been using git based website backups for a while now. In my previous post, I talked about how my backup system works. So far, it’s worked fairly flawlessly, but there have been lessons learned.
The first lesson I learned is that you CAN dump a MySQL database and keep it in git, however, you have to do it the right way. Turns out, by default, mysqldump puts everything on one line, including new data. This makes for extremely large database dumps, and makes it almost useless to tell what’s new and what’s old. By using “–complete-insert”, it splits each row into a separate SQL command. This makes the initial version quite a bit larger, but subsequent versions smaller. My database 41MB when the dump is complete, which is quite large.
The second lesson I’m now learning is that git stores the entire history locally inside the .git directory. This initially doesn’t take much space, as it’s compressed, but it grows. I have limited space on my host, and currently my .git directory is pushing 131MB. My entire blog directory is now 191MB, so really, the content is just 19MB. That’s a LOT of space (I blame it all on the database, the actual content doesn’t change much), although each individual diff is about 500K. I did some digging, and the kernel.org folk said:
In general, git is not a viable solution for the the case of a large repository with relatively small individual checkouts. However, if developers do not intend to clone, fetch, push into or push from their repositories, then use shallow clones
Well that certainly limits the usefulness now doesn’t it! Unless I’m missing something, it’s basically saying that if you use ‘shallow’ copies, then you can’t really do anything with them unless you create patches locally and apply them in a full copy and commit. In a large team, I would say this makes git almost useless. Back to my situation, it means that it limits the usefulness of backups to almost zero.
So is git the best way to go? Well maybe. Perhaps backing up the database in the method I chose is the best way. I probably would have been better compressing it, naming it, and shipping it to another server. But hey, I had to find this out right?
2 Comments »
RSS feed for comments on this post. TrackBack URI
Leave a comment
Powered by WordPress with Pool theme design by Borja Fernandez. I rewrote the CSS because I'm cool like that.
Entries and comments feeds.
Valid XHTML and CSS. ^Top^

There’s a command in git to basically compress all the diffs up to a certain point into one delta, essentially. I *think* it’s git rebase (can’t remember though).
So basically, if you’ve got 100 commits, and it’s taking up a huge amount of space, you can compress the first 99 (or how ever many you please) into a single commit. Yes, you lose the individual history, but if you’re doing regular backups, it might not be such a bad idea to do this past a certain threshold. You’d lose some granularity, but you’d regain space.
Comment by jbrennan — March 31, 2010 #
You can also use git gc to optimize the repository. I don’t know how useful that will be for this unusual use case. Git is really intended for development, not general purpose versioning.
Comment by Kibiz0r — May 1, 2010 #