
Since my last post, and yet another private self-promise to post on a more regular basis, with more interesting or in depth info than my previous rambles, I have managed to think of around 6 different post ideas, then forgotten about them. So in absence of depth or interest, here’s some code, instead. Or at least some writing about some code.
With the rise of Twitter from little known geek project to social media darling in the last few months, that all important 140 character limit has revealed the need for URL shortening services even more efficient than the ubiquitous TinyURL. Following on from Jonathan Snook’s post on building your own, I thought I’d have a little play around. I don’t write as much code casually as I used to and it’s good to keep your hand in.
The basics
Unless you’re insane, you’ve can probably guess that you can’t actually convert very long URLs themselves to very tiny ones. Well maybe you can but if so, that seems unnecessary. You simply assign a unique numeric ID to the URL and convert that instead. But to what? Aaaah. Let me show you!
*wavey screen transition and swirly music*
The most commonly known, used and understood numeric system is the Decimal system, otherwise known as base 10. It is known as such because it consists of 10 unique symbols, 0 through to 9. This is a nice, easy way for us to cope with accounting for the things which surround us, taking it’s base from the number of fingers (and toes) on your average human. There are other less commonly used numeral systems using a wider selection of symbols which are better suited to other specific purposes, none of which I’m familiar enough with to talk about with any authority, but there’s always Wikipedia, should you want to know more.
The point is that we can take our URL’s unique ID in base 10 and convert it to a higher base equivalent, a number using a wider range of symbols per unit increment, so our shortened URL code can reach high values without taking on the physical length of its decimal value. How do you convert bases? Well you could look up or work out something clever, be better educated than me and just know, or you could use PHP’s base_convert function. This allows the conversion of numbers from bases between 2 and 36. For this particular purpose I’m going to the maximum, base 36. This base uses 0-9 and letters a-z as its 36 symbols. So, for example, for a decimal value of 1,000,000:
1 2 | <?php echo base_convert(1000000, 10, 36); // Outputs 'lfls' |
When a user visits the URL, http://yourshortenerthingdomain.com/lfls, the the reverse base_convert is done (same function, swap the 36 and the 10) and the ID of 1000000 used to retrieve the corresponding URL from the database. The user is then simply redirected.
Where is it?
I own a few different domains, some useful, some speculative for hopefully eventually completed future projects, others fairly useless. In the latter category is tdous.com. The domain represents a username I’ve used for some times for a lot of websites. It’s based on my initials, t and d, and the letters ous, as in suspiciOUS, so, pertaining to me, with the added consequence that t.d.ous sounds like tedious, as an HILARIOUS self-deprecating joke. So, after years of no use whatsoever, http://tdous.com/ now hosts an experimental (in the sense of it being a personal experiment, not that it holds some secret magic new technique) URL shortener. Only I will use it as there are many such sites with shorter base URLs, but popularity or general use at all was not its purpose. It is essentially completely unnecessary. Also, it has been barely tested, but seems to work. There is only the most basic spam filtering.
There are issues with this approach. base_convert apparently suffers a few bugs at very high values as a result of how such numbers are handled in PHP (I assume it’s a PHP specific problem) and there is a theoretical limit of around 2 billion when you reach the maximum signed 32-bit integer. Also, since this base 36 conversion uses a sequential pattern of symbols, from 0-9 then a-z in that order, it’s URLs are predictable, so this will eventually produce URL codes which could be offensive to some people. Though that’s not so much of a problem for me!
So admittedly I’ve shown very little of the code involved, but most of it is the actual form handling, spam filtering, database connecting and so on. The script in total is very small and I thought I’d have more fun talking about this than literally showing it off. And I did. So I was right. And therefore I win.