|
Posted by Stian Berger on 02/12/05 00:11
On Thu, 10 Feb 2005 12:28:19 -0700, Jason Motes <php@imotes.com> wrote:
> Hello,
>
> Is there anyway to retrieve the properties from a pdf file using php?
>
> When you right click on a pdf file in windows you can see the title of=
=
> the file and you can change this property there also.
>
> I wrote a php page that lists all files in a certain directory. I wan=
t =
> to be able show the actual title of the document instead of just the =
> file name.
>
> I have searched the manual and google, everything that comes up refers=
=
> to generating pdfs on the fly, not working with an already made pdf.
>
>
> Thanks in Advance,
>
> Jason Motes
If you study the structure of pdf in a text editor, you'll notice that i=
t =
is quite readable. If you go to the end of a document, prefferably a sma=
ll =
one, you will see the trailer. Here is al list of all bytepositions of t=
he =
objects the pdf document contains. These objects can be images, text, pa=
ge =
descriptions, page groups and so on. There can also be a reference to th=
e =
properties of the document wich can contain creator, author, title etc. =
It =
is easy to read the objects, but to change them can prove rather =
difficult, as you will need to update the trailer informaition with new =
=
byteposition for all objects that comes after the properties (hard to =
explain), unless you keep the new info the same length.
But as you only need to extract the title you can do this with a simple =
=
regexp.
while(filelistingstuff) {
$document =3D file_get_contents($pdf);
if(preg_match("/\/Title\s\((.*?)\)/i",$document,$match)) {
$title =3D $match[1];
} else {
$title =3D $filename;
}
}
Hope this helps.
-- =
Stian
Navigation:
[Reply to this message]
|