• Posted on: Mar 21, 2009
  • Tag:
  • Reactions: 89

> The perfect PHP clean url generator

In my hunt for the perfect clean url (smart url, slug, permalink, whatever) generator I’ve always slipped in some exception or bug that made the function a piece of junk. But I recently found an easy solution I hope I could call “definitive”.

Clean url generators are crucial for search engine optimization or just to tidy up the site navigation. They are even more important if you work with international characters, accented vowels /à, è, ì, .../, cedilla /ç/, dieresis /ë/, tilde /ñ/ and so on.

First of all we need to strip all special characters and punctuation away. This is easily accomplished with something like:

function toAscii($str) {
	$clean = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", '', $str);
	$clean = strtolower(trim($clean, '-'));
	$clean = preg_replace("/[\/_|+ -]+/", '-', $clean);

	return $clean;
}

With our toAscii function we can convert a string like “Hi! I’m the title of your page!” to hi-im-the-title-of-your-page. This is nice, but what happens with a title like “A piñata is a paper container filled with candy”?

The result will be a-piata-is-a-paper-container-filled-with-candy, which is not cool. We need to convert all special characters to the closest ascii character equivalent.

There are many ways to do this, maybe the easiest is by using iconv.

setlocale(LC_ALL, 'en_US.UTF8');
function toAscii($str) {
	$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
	$clean = preg_replace("/[^a-zA-Z0-9\/_| -]/", '', $clean);
	$clean = strtolower(trim($clean, '-'));
	$clean = preg_replace("/[\/_| -]+/", '-', $clean);

	return $clean;
}

I always work with UTF-8 but you can obviously use any character encoding recognized by your system. The piñata text is now transliterated into a-pinata-is-a-paper-container-filled-with-candy. Lovable.

If they are not Spanish, users will hardly search your site for the word piñata, they will most likely search for pinata. So you may want to store both versions in your database. You may have a title field with the actual displayed text and a slug field containing its ascii version counterpart.

We can add a delimiter parameter to our function so we can use it to generate both clean urls and slugs (in newspaper editing, a slug is a short name given to an article that is in production, source).

setlocale(LC_ALL, 'en_US.UTF8');
function toAscii($str, $delimiter='-') {
	$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
	$clean = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", '', $clean);
	$clean = strtolower(trim($clean, '-'));
	$clean = preg_replace("/[\/_|+ -]+/", $delimiter, $clean);

	return $clean;
}

// echo toAscii("A piñata is a paper container filled with candy.", ' ');
// returns: a pinata is a paper container filled with candy

There’s one more thing. The string “I’ll be back!” is converted to ill-be-back. This may or may not be an issue depending on your application. If you use the function to generate a searchable slug for example, looking for “ill” would return the famous Terminator quote that probably isn’t what you wanted.

setlocale(LC_ALL, 'en_US.UTF8');
function toAscii($str, $replace=array(), $delimiter='-') {
	if( !empty($replace) ) {
		$str = str_replace((array)$replace, ' ', $str);
	}

	$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
	$clean = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", '', $clean);
	$clean = strtolower(trim($clean, '-'));
	$clean = preg_replace("/[\/_|+ -]+/", $delimiter, $clean);

	return $clean;
}

You can now pass custom delimiters to the function. Calling toAscii("I'll be back!", "'") you’ll get i-ll-be-back. Also note that the apostrophe is replaced before the string is converted to ascii as character encoding conversion may lead to weird results, for example é is converted to 'e, so the apostrophe needs to be parsed before the string is mangled by iconv.

The function seems now complete. Lets stress test it.

echo toAscii("Mess'd up --text-- just (to) stress /test/ ?our! `little` \\clean\\ url fun.ction!?-->");
returns: messd-up-text-just-to-stress-test-our-little-clean-url-function

echo toAscii("Perché l'erba è verde?", "'"); // Italian
returns: perche-l-erba-e-verde

echo toAscii("Peux-tu m'aider s'il te plaît?", "'"); // French
returns: peux-tu-m-aider-s-il-te-plait

echo toAscii("Tänk efter nu – förr'n vi föser dig bort"); // Swedish
returns: tank-efter-nu-forrn-vi-foser-dig-bort

echo toAscii("ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöùúûüýÿ");
returns: aaaaaaaeceeeeiiiidnooooouuuuyssaaaaaaaeceeeeiiiidnooooouuuuyy

echo toAscii("Custom`delimiter*example", array('*', '`'));
returns: custom-delimiter-example

echo toAscii("My+Last_Crazy|delimiter/example", '', ' ');
returns: my last crazy delimiter example

I’m sure we are far from perfection and probably some php/regex guru will soon bury me under my ignorance suggesting an über-simple alternative to my function. What do you thing?

/Share the joy

/Reactions

    • Author: andré
    • Posted on: 2011/10/05
    • At: 16:44

    Great job man, thanks!
    I just put the code below, before return:
    [code]
    $clean = strtolower(trim($clean, '-'));
    [/code]
    this remove “-” at the ende os string.
    bye bye

    • Author: Johnyz
    • Posted on: 2011/10/11
    • At: 08:46

    Perfect! Thank you a lot

    • Author: Herbert
    • Posted on: 2011/10/13
    • At: 18:52

    Brilliant use of iconv. Kudos.

    I just wanted to add that
    /[^a-zA-Z0-9\/_|+ -]/
    can be reduced a little to
    %[^-/+|\w ]%

    • Author: m4t
    • Posted on: 2011/11/02
    • At: 09:53

    Thanks man! Very helpful :)

  • This is great, but not working if you put @.

    • Author: (Almost)Happy User
    • Posted on: 2011/12/11
    • At: 11:09

    This function deletes dots in filenames. How to preserve them?

    • Author: Christian
    • Posted on: 2011/12/15
    • At: 16:08

    Great man!!!

    • Author: Felix
    • Posted on: 2012/01/05
    • At: 09:50

    This function is great, nice clean and short. Is it possible to have the function change the following, if these are all in the same array…

    foo-bar foo > This I need to change to “foo-barfoo”

    foo bar foo > This I would need to change to “foobarfoo”

    but i can’t figure out how to do it with this function, without changing how it works.

    Can you help?

      • Author: TED
      • Posted on: 2012/02/26
      • At: 14:43

      Interesting.. that’s an odd example but I would do this by counting the blanks within and maybe putting each word into an array.. then associate and replace the blank.

      All depends on whether the bar foo are under your control or not, could become complicated quite quickly.

      Maybe a silly question, but are you just looking to delete the space in all words?

  • I had got a dream to start my firm, but I didn’t have enough amount of cash to do that. Thank God my close fellow proposed to utilize the business loans. Thence I took the collateral loan and made real my old dream.

    • Author: TED
    • Posted on: 2012/02/26
    • At: 14:40

    Nice, will be using this for cleaning the urls on my new expatriate website. Thanks :)

    • Author: Gadgetdude
    • Posted on: 2012/03/08
    • At: 16:45

    How does it cope with foreign languages like Greek and Russian? Or do I have to add a function that will convert “здраствуйте” into “zdrastvuyte” for example?

    • Author: simone
    • Posted on: 2012/04/06
    • At: 15:19

    I’m gonna use this in all my future projects. Thank you so much!

    • Author: João Paulo
    • Posted on: 2012/04/12
    • At: 03:19

    Limite char

    function url_slug($str, $replace=array(), $delimiter='-', $maxLength=200) {
    
    	if( !empty($replace) ) {
    		$str = str_replace((array)$replace, ' ', $str);
    	}
    
    	$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
    	$clean = preg_replace("%[^-/+|\w ]%", '', $clean);
    	$clean = strtolower(trim(substr($clean, 0, $maxLength), '-'));
    	$clean = preg_replace("/[\/_|+ -]+/", $delimiter, $clean);
    
    	return $clean;
    }
    
    • Author: Sam
    • Posted on: 2012/04/17
    • At: 18:25

    Why do I get this message? Fatal error: Cannot redeclare toAscii() (previously declared in

    Thank’s

    • you already have a toAscii function somewhere, call it toAscii2

    • Author: CloudCart
    • Posted on: 2012/04/20
    • At: 11:02

    Thanks for this! I’ve edited it a little bit becouse it won’t work correctly for me:

    function parseUrl($sVar) {
    $sDelimiter = ‘-’;
    $sVar = urldecode($sVar);
    $sVar = iconv(‘UTF-8′, ‘ASCII//TRANSLIT’, $sVar);
    $sVar = preg_replace(“/[^a-zA-Z0-9\/_|+ -]/”, ”, $sVar);
    $sVar = strtolower(trim($sVar, ‘-’));
    $sVar = preg_replace(“/[\/_|+ -]+/”, $sDelimiter, $sVar);
    }

    • Author: Erik Kralj
    • Posted on: 2012/05/07
    • At: 13:55

    Best function I’ve seen in a long time. Great tutorial! ;)

  • Everyone seems to forget about “ø” and “Ø” :-)
    str_replace(“ø”,”oe”, ..)
    str_replace(“Ø”,”OE”, ..)

    in some cases you may only want
    ø -> o
    Ø -> O

    • Author: Mark
    • Posted on: 2012/06/12
    • At: 13:09

    Thanks, Tweeted and using on my site.

    • Author: Tim
    • Posted on: 2012/06/14
    • At: 04:03

    Awesome! Thanks

  • I need to do this on my site. I currently using codeigniters native url generator function by its not good on foreign letters. My recent photos from Cordoba in Argentina are examples of letters going missing.

    • Author: andrew
    • Posted on: 2012/07/17
    • At: 05:32

    This is probably a stupid question, but how do I get the $clean to display as the URL? I mean, how would I get the URL to be http://www.mysite.com/i-ll-be-back
    Do I need to pass it through the htaccess file?

    thanks!

    • Author: Benhard
    • Posted on: 2012/07/28
    • At: 18:20

    I think we need to limit the strip end of the sentenses.
    function toAscii($str) {
    $clean = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", '', $str);
    $clean = strtolower(trim($clean));
    $clean = preg_replace("/[\/_|+ -]+/", '-', $clean);

    return $clean;
    }
    $str="I just say no! #$%^&*";
    print toAscii($str); //i-just-say-no

    • Author: Sara
    • Posted on: 2012/08/13
    • At: 12:14

    Marvelous! Just what I needed this very moment.
    I have a client which tried to use foreign characters in a file name during upload.

    So, before adding this to the client site, (it’s online and critical), I just need a confirmation that the code is fully working.

    (Since there are reply’s with edited code)

    • Author: Marlon Douglas
    • Posted on: 2012/08/25
    • At: 17:45

    Thanks, very good!!

    • Author: Wuilliam
    • Posted on: 2012/09/11
    • At: 00:18

    Wonderfull! Congratulations! It works full fine for me! :D

    • cool! thanks for sharing

    • Author: parasmani
    • Posted on: 2012/10/03
    • At: 11:47

    Awesome tool. Can be used for cleaning text content also.

  • Hi. May I convert char “ə” or “Ə”? I try but this char not work.

    • If it doesn’t work you have to convert it manually