Twitter icon LinkedIn Icon Github Icon Stackoverflow Icon

Auto-generating slugs with PHP

By: Roger Creasy

The term slug comes from the newspaper industry, where I spent 15 years of my career. Editors and reporters use a "slug" as a short name for an article, mostly while it is in development. The slug can contain information about the article like the section in which it will be published, or just a short version of the title.

In the web world a slug is a URL-safe, human readable version of the title used in the page URL. Rather than a URL like RogerCreasy.com/posts/php/12af54986, where 12af54986 is an id field in a database, a slug is written like RogerCreasy.com/posts/php/auto-generating-slugs-with-php.

Generating a slug using PHP is relatively simple. If you are working in MVC, add a method to whichever of your controllers persists content. Or, if you need the slug generator in multiple controllers, put it in its own class. If you are not using MVC, place the method in a class then call it before you persist the article or post.

    /**
     * Generate Slugs from title
     */
    private function generateSlug($title)
    {       

        // transliterate
        $text = iconv('utf-8', 'ISO-8859-1//TRANSLIT', $text);

        // Convert all dashes, spaces, & undescores to hyphens
        $title = str_replace(' - ', '-', $title);
        $title = str_replace('—', '-', $title);
        $title = str_replace('‒', '-', $title);
        $title = str_replace('―', '-', $title);
        $title = str_replace('_', '-', $title);
        $title = str_replace(' ', '-', $title);


        // Remove everything except 0-9, a-z, A-Z and hyphens
        $title = preg_replace('/[^A-Za-z0-9-]+/', '', $title);
        
        // Make lowercase - no need for this to be multibyte
        // since there are only 0-9, a-z, A-Z and hyphens left in the string
        $slug = strtolower($title);


        return $slug;
    }

The first line transliterates (converts to the closest English equivelant) any non-English or mathematical characters. The conversion I use, ISO-8859-1, covers most, but not all languages. If you have specific language use-cases not covered, you will have to handle your specific characters differently.

Next, we convert any dashes, em dash, en dash, underscores, etc to hyphens. Note that the first str_replace changes a dash to a hyphen. A dash is surrounded by spaces and separates thoughts - a hyphen connects words and has no spaces around it. The remainder of the lines are explained in the code comments.

I hope you find this helpful. If you have ways to improve my code, please let me know @rogercreasy on twitter.

Thanks for reading!

Publication Date: 2019-01-23 11:59:49