|
Posted by Rik on 10/25/06 00:31
David wrote:
> Hi,
>
> Could PHP be used to take a txt file (or set of txt files) and add a
> string of characters every X number of words or characters?
$text = file_get_contents('/path/to/text.txt');
$text = chunk_split($text,5000,$string_to_add);
> Say a txt file with 50,000 characters/5,000 words how would you go
> about adding a string of characters every 5,000 characters or 500
> words.
For characters it's easy, see above.
For words, it's a little bit harder. One could fiddle around with
str_word_count(), but I would not think that the best solution.
If it does not have to be an exact:
preg_match_all('/(?:(?:^|\W*)\w*){0,500}/s',$text,$matches);
$text = implode($matches[0],$string_to_add);
> To improve on this I'd want to if using characters as the guide to use
> a space or better yet a line break as the point to add the string of
> characters. So 5,000 characters to the nearest line break.
********** TRY 1 *****************************************
/* settings */
$string_to_add = 'Hey, this is added!!!!!!!';
$char_to_split = "\n";
$charcount_to_split = 200;
/* match char_to_split */
$char_to_split = preg_quote($char_to_split);
preg_match_all('/'.$char_to_split .'/',$text,$matches,PREG_OFFSET_CAPTURE);
/* add difference to desired position, and which occurance */
$available_line_breaks = $matches[0];
function diffs(&$value,$key,$number){
$occ = round($value[1]/$number,0);
$value['occ'] = $occ;
$value['diff'] = abs($value[1] - ($occ * $number));
}
array_walk($available_line_breaks,'diffs',$charcount_to_split);
/* determine which line-break is closest */
$closest = array();
function closest(&$value,$key,&$closest){
if(!isset($closest[$value['occ']]) || $closest[$value['occ']]['diff'] >
$value['diff']){
$closest[$value['occ']] = array('diff' => $value['diff'],'offset' =>
$value[1]);
}
}
array_walk($available_line_breaks,'closest',&$closest);
array_walk($closest, create_function('&$a','$a = $a["offset"];'));
/* this code means that if there are no available line-breaks around, there
will be no value. To illustrate: */
$not_set =
array_diff(range(1,floor(strlen($text)/$charcount_to_split)),array_keys($cl
osest));
echo "For the following repeats of $charcount_to_split, no linebreaks were
found:".implode(',',$not_set);
/* you could search for a word-boundary (\W) in that region, I've left that
out */
/* Let's add the string, form last to first, otherwise our offset is off...
*/
krsort($closest);
foreach($closest as $target){
$text =
substr_replace($text,$string_to_add,$target+strlen($char_to_split),0);
}
*******************************************************
But offcourse, this is bullsh*t.
********** TRY 2 *****************************************
$text = text to adapt.
$string = string to add.
$count = preferred number of characters.
$split = string to split on.
$variance = the number of characters to search left and right.
function replace_text_several_times($text,$insert,$count,$split,$variance =
50){
$split = preg_quote($split,'/');
$regex =
'/(.{'.($count-$variance).','.($count+$variance).'})('.$split.')/si';
return preg_replace($regex,'$1$2'.$insert,$text);
}
The code above will not be near the exact number of characters, but will
nevertheless repeat the string as often as you like provided your $split
occurs.
--
Rik Wasmus
[Back to original message]
|