Andy Reitz (blog)

 

 

Compiling perl 5.10.0 as a 64-bit binary on Mac OS X 10.5

| Comments

This morning, one of the science folks here at work asked me to install a 64-bit version of perl on one of the Mac Pros that we have, which happened to be running Mac OS X 10.5 (aka "Leopard"). I first checked to see if Apple hadn't already taken care of this for me, but alas, they had not:

amy:$ file /usr/bin/perl
/usr/bin/perl: Mach-O universal binary with 2 architectures
/usr/bin/perl (for architecture ppc7400):	Mach-O executable ppc
/usr/bin/perl (for architecture i386):	Mach-O executable i386

So, like a good and helpful person, I set about compiling the latest stable version of perl, 5.10.0, on said Mac Pro, with 64-bit mode enabled.

Which of course, wasn't as easy as it sounds. After getting the 'Configure' script to work (which took a few passes), I was greeted with this error from gcc, on the third file that it tried to compile:

amy:perl-5.10.0 research$ make
`sh  cflags "optimize='-O3'" toke.o`  toke.c
	  CCCMD =  cc -DPERL_CORE -c -fno-common -DPERL_DARWIN -no-cpp-precomp \
          -arch ppc64 -I/usr/local/include -arch x86_64  -O3  
toke.c: In function 'Perl_yylex':
toke.c:6633: error: invalid lvalue in unary '&'
toke.c:6633: error: invalid lvalue in unary '&'
...

Instead of drawing this out - I'll skip to the end. If you have the above error, here is the solution - when the 'Configure' script asks you this question:

I can use /usr/bin/nm to extract the symbols from your C libraries. This
is a time consuming task which may generate huge output on the disk (up
to 3 megabytes) but that should make the symbols extraction faster. The
alternative is to skip the 'nm' extraction part and to compile a small
test program instead to determine whether each symbol is present. If
you have a fast C compiler and/or if your 'nm' output cannot be parsed,
this may be the best solution.

You probably shouldn't let me use 'nm' if you are using the GNU C Library.

Shall I use /usr/bin/nm to extract C symbols from the libraries? [y]

At that question - it is super freakin' important that you answer NO. For some reason, this step fails horribly in 64-bit mode on Leopard (it works fine in 32-bit mode). Basically, the 'Configure' script determines that Leopard doesn't support ~200 or so native POSIX functions, so perl attempts to fall back to some hacked up replacements that it has internally. Many of which (all of which?) are quite broken.

Once you do that, you should be able to produce a working 64-bit version of perl on Leopard. After the jump, I'll discuss this problem in a bit more detail, and reveal how I figured out just what was going on.

-Andy.

What is gcc really saying here?

If you look at line 6633 in toke.c, you'll see this line of code:

if (memchr(tmpbuf, ':', len))

Looks innocent enough, and I don't see any '&' signs in there at all! I spent some time mucking with different compiler settings, until I found that the same C file compiled just fine on my MacBook, which is 32-bit, and configured to produce a 32-bit perl. Since the files were the same on both machines, I knew that I had some sort of #define issue, caused by a header file.

However, since I didn't know which header file or #define could be causing the issue, I looked at the C-preprocessor version of the file (generated by passing the '-E' flag to gcc) in order to see the actual line of code that was failing. Because of all of the preprocessor branches in toke.c, the pre-processed version of the file looked quite different that the raw source file. It was also over 20,000 lines long. In order to find the offending line of code, I actually needed to have three different versions of the file open - the preprocessor output from my 32-bit build (which worked), from the 64-bit build (didn't work), and the original source file. My editor of choice doesn't parse the C code as you edit it, so it can't tell you what function you're in. The 'Perl_yylex()' is crazy-long, so I needed an editor which could tell me which function my cursor is currently in. Hence, I opened up the preprocessed files in Xcode.

With all of this in place, I was able to determine that the above line of code got translated to the following in my 64-bit build:

if (Perl_ninstr((char*)(tmpbuf), ((char*)(tmpbuf)) + len, &(':'), &(':') + 1))

While I'm not a C guru by any stretch, I think I can see here what gcc is complaining about - that &(':') stuff looks totally whack. But, this line of code gave me something to grep for - looking for Perl_ninstr, I found:

amy:$ grep Perl_ninstr *.h
embed.h:#define ninstr			Perl_ninstr
embed.h:#define ninstr(a,b,c,d)		Perl_ninstr(aTHX_ a,b,c,d)
proto.h:PERL_CALLCONV char*	Perl_ninstr(pTHX_ const char* big, const char* bigend, ...

The first hit is interesting - basically perl is defining ninstr() to point to Perl_ninstr(). Looking for just the string "ninstr" reveals:

amy:$ grep ninstr *.h
embed.h:#define ninstr			Perl_ninstr
embed.h:#define rninstr			Perl_rninstr
embed.h:#define ninstr(a,b,c,d)		Perl_ninstr(aTHX_ a,b,c,d)
embed.h:#define rninstr(a,b,c,d)	Perl_rninstr(aTHX_ a,b,c,d)
perl.h:#       define memchr(s,c,n) ninstr((char*)(s), ((char*)(s)) + n, &(c), &(c) + 1)
proto.h:PERL_CALLCONV char*	Perl_ninstr(pTHX_ const char* big, const char* bigend, ...
proto.h:PERL_CALLCONV char*	Perl_rninstr(pTHX_ const char* big, const char* bigend, ...

The hit from 'perl.h' is interesting. Here is the full snippet of code:

#ifndef PERL_MICRO
#ifndef memchr
#   ifndef HAS_MEMCHR
#       define memchr(s,c,n) ninstr((char*)(s), ((char*)(s)) + n, &(c), &(c) + 1)
#   endif
#endif
#endif

Bingo! This code in 'perl.h' is what is making the memchr() function go all unbuildy on my 64-bit machine. But, this will only happen if the "HAS_MEMCHR" preprocessor variable isn't defined. Sure enough, on the 32-bit machine:

areitz@rosetta:/tmp/perl-5.10.0$ grep HAS_MEMCHR config.h
/* HAS_MEMCHR:
#define HAS_MEMCHR	/**/

And on the 64-bit machine:

>amy:$ grep HAS_MEMCHR config.h /* HAS_MEMCHR: /*#define HAS_MEMCHR / **/

Drat! Well, I know that Mac OS X has the 'memchr()' function, even in 64-bit mode. But for some reason, perl thinks that it doesn't, and is trying to substitute in some whack code to compensate. I tried just uncommenting this, and managed to get 'toke.c' to compile. However, a few files later, I encountered this error:

`sh  cflags "optimize='-O3'" util.o`  util.c
	  CCCMD =  cc -DPERL_CORE -c -fno-common -DPERL_DARWIN -no-cpp-precomp \
         -arch ppc64 -fno-common -DPERL_DARWIN -no-cpp-precomp -arch x86_64 \
         -I/usr/local/include  -O3  -Wall 
util.c:1854: error: conflicting types for 'vsprintf'
util.c: In function 'vsprintf':
util.c:1855: error: storage size of 'fakebuf' isn't known
util.c:1869: error: '_IOWRT' undeclared (first use in this function)
util.c:1869: error: (Each undeclared identifier is reported only once
util.c:1869: error: for each function it appears in.)
util.c:1870: warning: implicit declaration of function '_doprnt'
util.c:1855: warning: unused variable 'fakebuf'
util.c:1854: error: conflicting types for 'vsprintf'

Examining this code in 'util.c', I found this helpful comment block:

/* This vsprintf replacement should generally never get used, since
   vsprintf was available in both System V and BSD 2.11.  (There may
   be some cross-compilation or embedded set-ups where it is needed,
   however.)

If you encounter a problem in this function, it's probably a symptom
that Configure failed to detect your system's vprintf() function.
See the section on "item vsprintf" in the INSTALL file.

This version may compile on systems with BSD-ish ,
but probably won't on others.
*/

This is where I found what I mentioned above - that the 'Configure' script was having trouble detecting libc functions. I thought about just commenting in the right functions in the 'config.h' file - but I quickly found that perl thought that there were a lot of missing functions - well over 200 by my count. Since I didn't want to uncomment over 200 lines of C, I decided to muck around with the 'Configure' script until I found the right options, which I have highlighted above.