|
Posted by billious on 10/09/06 05:38
"batman" <uspensky@gmail.com> wrote in message
news:1160368462.065423.142350@m7g2000cwm.googlegroups.com...
> billious wrote:
>>i have a text file that is like:
>>
>>date = OCT0606
>>asdf
>>START-OF-DATA
>>asdfasdfg
>>asdfgsdfg
>>END-OF-DATA
>>asdfgalsdkdfklmlkm
>>
>>i need to clear everything from this file except the data between the
>>START-OF-DATA and END-OF-DATA using a batcj file... elternitavly i am
>>open to suggestions of how to import using bulk insert in sql without
>>changing the file at all. data is pipe seperated but obvioulsy has
>>plenty of junk data in it. i have 2 similar files at about 30mb and
>>60mb in size. thnks everyone
>>
>>
>> Since you are posting from XP, do you want an XP solution?
>> alt.msdos.batch.nt deals with NT-series, and alt.msdos.batch with DOS and
>> 9x.
>>
[snip]
>> Without better knowledge of your file's content, refining this is a
>> little
>> difficult.
>
> here is some actual content from the file (shortened ofcourse to just a
> few records) ideally i would like to avoid using a batch file and keep
> it all on the sql level (sql 2000).... the file pasted in here much
> perttier than it looks in notepad (with the squares)
>
> thanks for ur help
>
>
> ------ file starts below
> START-OF-FILE
> PROGRAMNAME=getdata
[sniparoo]
> END-OF-FIELDS
>
> TIMESTARTED=Tue Sep 26 17:33:28 EDT 2006
> START-OF-DATA
> AA US Equity|0|95|AA|US|ALCOA INC|US|USD|Common
> Stock|1.000000000|New York|UN|3334|ALUMINUM
> PROD|Mining|Metal-Aluminum|Basic
[more in this vein...]
> Stock|1000|N|N.A.|
> END-OF-DATA
> DATARECORDS=57129
> TIMEFINISHED=Tue Sep 26 17:51:45 EDT 2006
> END-OF-FILE
>
> ------ file end above
Without knowing quite where the "squares" are, other than surrounding the
START/END-OF-FILE, and assuming that "10" is 10 decimal, not 10HEX then the
following MAY work. Can't test it without knowing where else these
non-alphamerics are or whether reproducing them is important.
----- batch begins -------
[1]@echo off
[2]setlocal enabledelayedexpansion
[3]set yel=N
[4]for /f "tokens=*" %%i in (psdx.txt) do call :process "%%i" &if !yel!==Y
echo %%i>>psdout.txt
[5]goto :eof
[6]
[7]:process
[8]:: if last-line-processed was "START-OF-DATA" switch ON echoing
[9]if %yel%==F set yel=Y
[10]:: get just first token - removing embarrassing pipes
[11]for /f "tokens=1delims=|" %%j in (%1) do set ytd=%%j
[12]if not defined ytd goto :eof
[13]:: remove nasty squares either end of "START/END-OF-DATA"
[14]::set ytd=%ytd:~1,-1%
[15]if "%ytd%"=="START-OF-DATA" set yel=F
[16]if "%ytd%"=="END-OF-DATA" set yel=N
[17]goto :eof
------ batch ends --------
Lines start [number] - any lines not starting [number] have been wrapped
and should be rejoined. The [number] that starts the line should be removed
The label :eof is defined in NT+ to be end-of-file but MUST be expressed
as :eof
[14] is deliberately commented-out. If the "square" is ^J (10 decimal) then
the process appears to work as-is. If the square is ^P (10 HEX) then you'd
need to remove the colons from [14]
And I obviously haven't checked that the reproduced data is PRECISELEY the
same as the source.
And PLEASE end-post. Top-posting is for emails, not Usenet.
[Back to original message]
|