Konsolen-Logo

unoconv: umwandlung zwischen allen Dokument-Formaten von OpenOffice

This blog post has been published on 2010-12-14 and may be out of date.

Mit unoconv kann man viele Dokument-Formate konvertieren, unterstützte Formate sind unter anderem das  “Open Document Format” (.odt), “MS Word” (.doc), “MS Office Open/MS OOXML” (.xml), “Portable Document Format” (.pdf), “HTML”, “XHTML”, “RTF”, “Docbook” (.xml)…  

Funktionen:

  • konvertiert alle Formate die OpenOffice unterstützt
  • OpenOffice unterstützt bis zu 100 Dokument Formate :-)
  • kann genutzt werden um Vorgänge zu automatisieren (Skripte -> z.B. shell oder php)
  • unterstützt weitere Tools -> “asciidoc”, “docbook2odf/xhtml2odt”
  • kann Style-Vorlagen (templates) während der Konvertierung anwenden (corporate identity)
  • kann sowohl als Server, als auch als Client fungieren


Formate:

Es folgt eine Liste von Ausgabe-Formaten von OpenOffice (und somit auch von unoconv), die Eingabe-Formate können sich jedoch unterscheiden -> INPUT / EXPORT


Export:

  • bib – BibTeX [.bib]
  • doc – Microsoft Word 97/2000/XP [.doc]
  • doc6 – Microsoft Word 6.0 [.doc]
  • doc95 – Microsoft Word 95 [.doc]
  • docbook – DocBook [.xml]
  • html – HTML Document (OpenOffice.org Writer) [.html]
  • odt – Open Document Text [.odt]
  • ott – Open Document Text [.ott]
  • ooxml – Microsoft Office Open XML [.xml]
  • pdb – AportisDoc (Palm) [.pdb]
  • pdf – Portable Document Format [.pdf]
  • psw – Pocket Word [.psw]
  • rtf – Rich Text Format [.rtf]
  • latex – LaTeX 2e [.ltx]
  • sdw – StarWriter 5.0 [.sdw]
  • sdw4 – StarWriter 4.0 [.sdw]
  • sdw3 – StarWriter 3.0 [.sdw]
  • stw – Open Office.org 1.0 Text Document Template [.stw]
  • sxw – Open Office.org 1.0 Text Document [.sxw]
  • text – Text Encoded [.txt]
  • txt – Plain Text [.txt]
  • vor – StarWriter 5.0 Template [.vor]
  • vor4 – StarWriter 4.0 Template [.vor]
  • vor3 – StarWriter 3.0 Template [.vor]
  • xhtml – XHTML Document [.html]
  • […]


Installation:

aptitude install unoconv asciidoc docbook2od


Beispiele 1: Standard

Als erstes ein simples Beispiel, hier wird einfach “odt” in ein “pdf” umgewandelt. Sehr hilfreich ist auch sich die Optionen einmal anzuschauen.

# unoconv - Dienst starten
unoconv --listener &
# odt -> pdf 
unoconv -f pdf some-document.odt
# Standard 
(unoconv --server localhost --port 2002 --stdout -f pdf some-document.odt)

Beispiele 2: Vorlage

Wie bereits auf der Entwicklerseite zu lesen ist, hilf uns ein Screenshot nicht wirklich weiter, daher folgt ein zweites Beispiel mit Vorlagen.

# Beispiel Dateien herunterladen 
wget http://dag.wieers.com/cv/Makefile
wget http://dag.wieers.com/cv/curriculum-vitae-dag-wieers.txt
wget http://dag.wieers.com/cv/curriculum-vitae-docbook.ott

# unoconv - Dienst starten
unoconv --listener &
# resume.txt -> resume.xm
asciidoc -b docbook -d article -o resume.xml resume.txt
# resume.xml -> resume.tmp.odt
docbook2odf -f --params generate.meta=0 -o resume.tmp.odt resume.xml
# resume.tmp.odt -> resume.odt + Template
unoconv -f odt -t template.ott -o resume.odt resume.tmp.odt
# resume.tmp.odt -> resume.pdf + Template
unoconv -f pdf -t template.ott -o resume.pdf resume.odt
# resume.tmp.odt -> resume.html + Template
unoconv -f html -t template.ott -o resume.html resume.odt
# resume.tmp.odt -> resume.doc + Template
unoconv -f doc -t template.ott -o resume.doc resume.odt

Beispiele 3: Server <-> Client

Wie bereits erwähnt kann man den Dienst auch als Server starten und von anderen Rechnern darauf zugreifen.

# unoconv - Server-Dienst starten
unoconv --listener --server 1.2.3.4 --port 4567
# Client -> Server 
unoconv --server 1.2.3.4 --port 4567

Beispiele 4: PHP

Man kann dies nun auch in Shell-Skripten nutzen oder wie in diesem Beispiel in PHP einbinden.

$this->Filegenerator = new FilegeneratorComponent ($this->params["form"]['uploaddocfile']);
// if the filegenerator did all it's magic ok then process
if($this->Filegenerator)
// returns the text version of the PDF
$text = $this->Filegenerator->convertDocToTxt();
// returns the html of the PDF
$html = $this->Filegenerator->convertDocToHtml();
// returns the generated pdf file
$pdf = $this->Filegenerator->convertDocToPdf($doc_id);
}
<?php
/**
* Class Used to convert files.
*@author jamiescott.net
*/
class FilegeneratorComponent extends Object {

// input folder types
private $allowable_files = array ('application/msword' => 'doc' );
// variable set if the constuctor loaded correctly.
private $pass = false;
// store the file info from constuctor reference
private $fileinfo;

/**
* Enter description here...
*
* @param array $fileinfo
* Expected :
* (
[name] => test.doc
[type] => application/msword
[tmp_name] => /Applications/MAMP/tmp/php/php09PYNO
[error] => 0
[size] => 79360
)
*
*
* @return unknown
*/
function __construct($fileinfo) {

// folder to process all the files etc
define ( 'TMP_FOLDER', TMP . 'filegenerator/' . $this->generatefoldername () . '/' );

// where unoconv is installed
define ( 'UNOCONV_PATH', '/usr/bin/unoconv' );
// where to store pdf files
define ( 'PDFSTORE', ROOT . '/uploads/generatedpdfs/' );
// where to store doc files
define ( 'DOCSTORE', ROOT . '/uploads/docfiles/' );
// apache home dir
define ( 'APACHEHOME', '/home/apache' );
// set some shell enviroment vars
putenv ( "HOME=".APACHEHOME );
putenv ( "PWD=".APACHEHOME );

// check the file info is passed the tmp file is there and the correct file type is set
// and the tmp folder could be created
if (is_array ( $fileinfo ) &amp;amp;&amp;amp; file_exists ( $fileinfo ['tmp_name'] ) &amp;amp;&amp;amp; in_array ( $fileinfo ['type'], array_keys ( $this->allowable_files ) ) &amp;amp;&amp;amp; $this->createtmp ()) {

// bass by reference
$this->fileinfo = &amp;amp;$fileinfo;
// the constuctor ran ok
$this->pass = true;
// return true to the instantiation
return true;

} else {
// faild to instantiate
return false;

}

}

/**
*      * takes the file set in the constuctor and turns it into a pdf
* stores it in /uploads/docfiles and returns the filename
*
* @return filename if pdf was generated
*/
function convertDocToPdf($foldername=false) {

if ($this->pass) {

// generate a random name
$output_pdf_name = $this->generatefoldername () . '.pdf';

// move it to the tmp folder for processing
if (! copy ( $this->fileinfo ['tmp_name'], TMP_FOLDER . 'input.doc' ))
die ( 'Error copying the doc file' );

$command = UNOCONV_PATH;
$args = ' --server localhost --port 2002 --stdout -f pdf ' . TMP_FOLDER . 'input.doc';

$run = $command . $args;

//echo $run; die;
$pdf = shell_exec ( $run );
$end_of_line = strpos ( $pdf, "\n" );
$start_of_file = substr ( $pdf, 0, $end_of_line );

if (! eregi ( '%PDF', $start_of_file ))
die ( 'Error Generating the PDF file' );

if(!file_exists(PDFSTORE.$foldername)){
mkdir(PDFSTORE.$foldername);
}

// file saved
if(!$this->_createandsave($pdf, PDFSTORE.'/'.$foldername.'/', $output_pdf_name)){
die('Error Saving The PDF');
}

return $output_pdf_name;

}

}

/**
* Return a text version of the Doc
*
* @return unknown
*/
function convertDocToTxt() {

if ($this->pass) {

// move it to the tmp folder for processing
if (! copy ( $this->fileinfo ['tmp_name'], TMP_FOLDER . 'input.doc' ))
die ( 'Error copying the doc file' );

$command = UNOCONV_PATH;
$args = ' --server localhost --port 2002 --stdout -f txt ' . TMP_FOLDER . 'input.doc';

$run = $command . $args;

//echo $run; die;
$txt = shell_exec ( $run );

// guess that if there is less than this characters probably an error
if (strlen($txt) < 10)
die ( 'Error Generating the TXT' );

// return the txt from the PDF
return $txt;

}

}

/**
* Convert the do to heml and return the html
*
* @return unknown
*/
function convertDocToHtml() {

if ($this->pass) {

// move it to the tmp folder for processing
if (! copy ( $this->fileinfo ['tmp_name'], TMP_FOLDER . 'input.doc' ))
die ( 'Error copying the doc file' );

$command = UNOCONV_PATH;
$args = ' --server localhost --port 2002 --stdout -f html ' . TMP_FOLDER . 'input.doc';

$run = $command . $args;

//echo $run; die;
$html= shell_exec ( $run );
$end_of_line = strpos ( $html, "\n" );
$start_of_file = substr ( $html, 0, $end_of_line );

if (! eregi ( 'HTML', $start_of_file ))
die ( 'Error Generating the HTML' );

// return the txt from the PDF
return $html;

}

}
/**
* Create file and store data
*
* @param unknown_type $data
* @param unknown_type $location
* @return unknown
*/
function _createandsave($data, $location, $file) {

if (is_writable ( $location )) {

// In our example we're opening $filename in append mode.
// The file pointer is at the bottom of the file hence
// that's where $somecontent will go when we fwrite() it.
if (! $handle = fopen ( $location.$file, 'w' )) {
trigger_error("Cannot open file ($location$file)");
return false;
}

// Write $somecontent to our opened file.
if (fwrite ( $handle, $data ) === FALSE) {
trigger_error("Cannot write to file ($location$file)");
return false;
}

fclose ( $handle );
return true;

} else {
trigger_error("The file $location.$file is not writable");
return false;
}

}

function __destruct() {

// remove the tmp folder

if (file_exists ( TMP_FOLDER ) &amp;amp;&amp;amp; strlen ( TMP_FOLDER ) > 4)
$this->removetmp ();

}

/**
* Create the tmp directory to hold and process the files
*
* @return unknown
*/
function createtmp() {

if (is_writable ( TMP )) {

if (mkdir ( TMP_FOLDER ))
return true;

} else {

return false;
}

return false;

}

/**
* Delete the tmp dir
*
* @return unknown
*/
function removetmp() {

if (strlen ( TMP_FOLDER ) > 3 &amp;amp;&amp;amp; file_exists ( TMP_FOLDER )) {

if ($this->recursive_remove_directory ( TMP_FOLDER ))
return true;

}

return false;
}

/**
* Return a rendom string for the folder name
*
* @return unknown
*/
function generatefoldername() {

return md5 ( microtime () );

}

/**
* Recursivly delete directroy or empty it
*
* @param unknown_type $directory
* @param unknown_type $empty
* @return unknown
*/
function recursive_remove_directory($directory, $empty = FALSE) {
// if the path has a slash at the end we remove it here
if (substr ( $directory, - 1 ) == '/') {
$directory = substr ( $directory, 0, - 1 );
}

// if the path is not valid or is not a directory ...
if (! file_exists ( $directory ) || ! is_dir ( $directory )) {
// ... we return false and exit the function
return FALSE;

// ... if the path is not readable
} elseif (! is_readable ( $directory )) {
// ... we return false and exit the function
return FALSE;

// ... else if the path is readable
} else {

// we open the directory
$handle = opendir ( $directory );

// and scan through the items inside
while ( FALSE !== ($item = readdir ( $handle )) ) {
// if the filepointer is not the current directory
// or the parent directory
if ($item != '.' &amp;amp;&amp;amp; $item != '..') {
// we build the new path to delete
$path = $directory . '/' . $item;

// if the new path is a directory
if (is_dir ( $path )) {
// we call this function with the new path
recursive_remove_directory ( $path );

// if the new path is a file
} else {
// we remove the file
unlink ( $path );
}
}
}
// close the directory
closedir ( $handle );

// if the option to empty is not set to true
if ($empty == FALSE) {
// try to delete the now empty directory
if (! rmdir ( $directory )) {
// return false if not possible
return FALSE;
}
}
// return success
return TRUE;
}
}
}
?>

Published by

voku

Lars Moelleken | Ich bin root, ich darf das!

%d bloggers like this: