[R6RS] Transcoding with and without buffering

Michael Sperber sperber at informatik.uni-tuebingen.de
Mon Jul 24 13:13:32 EDT 2006


I've hacked up a little C program to measure the difference between
transcoding in bulk and transcoding byte-by-byte using the POSIX iconv
library.  I expect that iconv or ICU or some other external library
will be a popular and reasonable implementation strategy for
performing the transcoding.  The program has two functions, one which
feeds the contents of a test file to iconv as a whole, and the other
one byte-by-byte (transcoding from UTF-8 to UTF-8).

On a PowerBook G4 with 867Mhz using the iconv shipped with
the latest Mac OS X, I get:

Michael-Sperbers-Computer[266] ll UTF-8-demo.txt 
-rw-r--r--   1 sperber  PUstaff  14056 Dec  9  2004 UTF-8-demo.txt

With buffering:
Michael-Sperbers-Computer[263] time ./a.out UTF-8-demo.txt
11.347u 0.998s 0:12.90 95.5%    0+0k 0+2io 0pf+0w

Without buffering:
Michael-Sperbers-Computer[265] time ./a.out UTF-8-demo.txt
76.096u 2.209s 1:20.51 97.2%    0+0k 0+2io 0pf+0w

Of course, the test program isn't entirely representative of the
transcoding machinery in a realistic I/O library, but it does show
that there's overhead involved.

-- 
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
-------------- next part --------------
#include <iconv.h>
#include <stdio.h>
#include <stdlib.h>

static void
test1(const char* in, size_t n)
{
  char* buf = (char*)malloc(65536);
  char* out = buf;

  iconv_t ic = iconv_open("UTF-8", "UTF-8");

  size_t n_out = 65536;
  size_t result = iconv(ic, &in, &n, &out, &n_out);

  iconv_close(ic);

  free(buf);

  /* printf("n=%ld n_out=%ld result=%ld\n", n, n_out, result); */
}

static void
test2(const char* in, size_t n)
{
  const char* in_start = in;
  char* buf = (char*)malloc(65536);
  char* out = buf;

  iconv_t ic = iconv_open("UTF-8", "UTF-8");

  while (in < in_start + n)
    {
      size_t in_count = 1;
      for (;;)
	{
	  size_t n_out = 65536;
	  size_t result = iconv(ic, &in, &in_count, &out, &n_out);
	  /* printf("in=%p in_count=%ld n_out=%ld result=%ld\n", in, in_count, n_out, result); */
	  if ((result == -1) && (errno == EINVAL))
	    ++in_count;
	  else
	    break;
	} 
    }

  free(buf);

  iconv_close(ic);
}

int
main(int argc, char* argv[])
{
  FILE* f = fopen(argv[1], "r");
  const char* in = (const char*)malloc(65536);
  size_t n = fread((void*)in, 1, 65536, f);

  int i = 0;
  while (i < 10000)
    {
      /* test1(in, n); */
      test2(in, n);
      ++i;
    }
  return 0;
}


More information about the R6RS mailing list