|
Posted by Bob Winter on 06/25/05 07:20
Dotan Cohen wrote:
> On 6/25/05, Robert Cummings <robert@interjinn.com> wrote:
>
>>On Fri, 2005-06-24 at 21:02, Dotan Cohen wrote:
>>
>>>Hi friends, I've got a nice array of contractions (I've, I'd,
>>>they'll,...). My intent is to take submitted data and replace, say,
>>>every occurance of 'theyd' with 'they'd'. So far, so good. The trick
>>>is doing it if the first character is uppercase. I tried going
>>>throught the array, one by one, and doing the preg_replace twice, once
>>>for each item, and once for each item with the first letter
>>>capitalized. It wasn't very succesful, so I've been doing this:
>>>$the_lyrics=str_replace("\bid\b", "I'd", $the_lyrics);
>>>$the_lyrics=str_replace("\bi'd\b", "I'd", $the_lyrics);
>>>$the_lyrics=str_replace("\bId\b", "I'd", $the_lyrics);
>>>$the_lyrics=str_replace("\bim\b", "I'm", $the_lyrics);
>>>$the_lyrics=str_replace("\bi'm\b", "I'm", $the_lyrics);
>>>$the_lyrics=str_replace("\bIm\b", "I'm", $the_lyrics);
>>>$the_lyrics=str_replace("\bi've\b", "I've", $the_lyrics);
>>>$the_lyrics=str_replace("\bive\b", "I've", $the_lyrics);
>>>$the_lyrics=str_replace("\bIve\b", "I've", $the_lyrics);
>>>$the_lyrics=str_replace("\bi'll\b", "I'll", $the_lyrics);
>>>$the_lyrics=str_replace("\bIll\b", "I'll", $the_lyrics);
>>>$the_lyrics=str_replace("\bi\b", "I", $the_lyrics);
>>>$the_lyrics=str_replace("\byoure\b", "you're", $the_lyrics);
>>>$the_lyrics=str_replace("\bYoure\b", "You're", $the_lyrics);
>>>$the_lyrics=str_replace("\byoull\b", "you'll", $the_lyrics);
>>>$the_lyrics=str_replace("\bYoull\b", "You'll", $the_lyrics);
>>>$the_lyrics=str_replace("\byouve\b", "you've", $the_lyrics);
>>>$the_lyrics=str_replace("\bYouve\b", "You've", $the_lyrics);
>>>$the_lyrics=str_replace("\bits\b", "it's", $the_lyrics);
>>>$the_lyrics=str_replace("\bIts\b", "It's", $the_lyrics);
>>>$the_lyrics=str_replace("\bwasnt\b", "wasn't", $the_lyrics);
>>>$the_lyrics=str_replace("\bWasnt\b", "Wasn't", $the_lyrics);
>>>$the_lyrics=str_replace("\bthats\b", "that's", $the_lyrics);
>>>$the_lyrics=str_replace("\bThats\b", "That's", $the_lyrics);
>>>$the_lyrics=str_replace("\btheyre\b", "they're", $the_lyrics);
>>>$the_lyrics=str_replace("\bTheyre\b", "They're", $the_lyrics);
>>>$the_lyrics=str_replace("\btheyll\b", "they'll", $the_lyrics);
>>>$the_lyrics=str_replace("\bTheyll\b", "They'll", $the_lyrics);
>>>$the_lyrics=str_replace("\bcant\b", "can't", $the_lyrics);
>>>$the_lyrics=str_replace("\bCant\b", "Can't", $the_lyrics);
>>>$the_lyrics=str_replace("\bdidnt\b", "didn't", $the_lyrics);
>>>$the_lyrics=str_replace("\bDidnt\b", "Didn't", $the_lyrics);
>>>$the_lyrics=str_replace("\bdont\b", "don't", $the_lyrics);
>>>$the_lyrics=str_replace("\bDont\b", "Don't", $the_lyrics);
>>>$the_lyrics=str_replace("\bdoesnt\b", "doesn't", $the_lyrics);
>>>$the_lyrics=str_replace("\bDoesnt\b", "Doesn't", $the_lyrics);
>>>$the_lyrics=str_replace("\bweve\b", "we've", $the_lyrics);
>>>$the_lyrics=str_replace("\bWeve\b", "We've", $the_lyrics);
>>>
>>>Which, as you can see, is not exactly optimized code. How would
>>>someone more professional than myself go about this? I was thinking
>>>about maybe a two-dimentional array, but stopped short to consult with
>>>you guys first.
>>
>>string_replace() supports taking two arrays from which to retrieve the
>>needles and the replacements so that you only need to invoke the
>>function once. This will speed things up considerably. On that note you
>>have a couple of bugs...
>>
>> "its" is a valid word for possession (its woodwork is exquisite).
>>
>> 'Ill" is also valid (Ill beset by fortune).
>>
>>Cheers,
>>Rob.
>>--
>>.------------------------------------------------------------.
>>| InterJinn Application Framework - http://www.interjinn.com |
>>:------------------------------------------------------------:
>>| An application and templating framework for PHP. Boasting |
>>| a powerful, scalable system for accessing system services |
>>| such as forms, properties, sessions, and caches. InterJinn |
>>| also provides an extremely flexible architecture for |
>>| creating re-usable components quickly and easily. |
>>`------------------------------------------------------------'
>>
>>
>
>
> Ill I knew about, its I didn't. I didn't mean to put ill in there...
>
> Should I enter each contraction twice (for the capitalization), or
> should I try to do something smart so that the capitalization will
> happen automatically. The 'I' contractions are special, I will deal
> with those seperatly.
Dotan,
Your task intrigued me, so I put together a function that will help process your data:
<?php
// This is the array of correct spellings for the target words, all lower case first letters.
$list = array();
$list[] = "wasn't";
$list[] = "that's";
$list[] = "they're";
$list[] = "they'll";
$list[] = "can't";
$list[] = "didn't";
// my sample text that needs correction
$string = "I wasnt there, Theyll tell you I cant and Didnt.";
function addApos($list, $string) {
// I am assuming that you will make sure that $list is in the correct format and other error checking.
// Here I am creating two arrays with matching keys and values with both lower & upper case first letters
// and with & without the correct apostrophe.
// Then I am using the two new arrays to process the string for correction.
$list_case = array();
$list_case_strip = array();
$i = 0;
foreach($list as $value) {
$list_case[$i] = $value;
$list_case_strip[$i] = str_replace("'", "", $value);
$i++;
$list_case[$i] = strtoupper($value{0}).substr($value, 1);
$list_case_strip[$i] = str_replace("'", "", $list_case[$i]);
$i++;
}
$string_fixed = str_replace($list_case_strip, $list_case, $string);
return $string_fixed;
}
$result = addApos($list, $string);
print "Original string: $string<br /><br />";
print "Corrected string: $result<br />";
?>
--Bob
[Back to original message]
|