1.1linux下mpi开发环境搭建流程及过程中出现的一些问题解决方法。

时间:2021-07-21 20:08:42


1.1.1     MPI并行计算环境搭建    

流程参考:http://blog.csdn.net/nohackccc/article/details/9061275

遇到的一些问题及解决方法:

(1)    启动mpd &时报错socket.gaierror:[Errno -2] Name…

原因:hostname =socket.gethostname()可获取到主机名,但socket.gethostbyname(hostname)获取不到主机IP.

解决办法是在/etc/hosts文件中在原127.0.0.1localhost下再添加一条127.0.0.1 myhostname

 

1.1.2      Linux下Boost编译安装

流程参考:http://blog.csdn.net/nohackccc/article/details/8987268

遇到的一些问题及解决方法:

/root/linux/boost_1_42_0/tools/build/v2

using mpi添加到usr-config.jam,具体参考/root/linux/boost_1_42_0/tools/build/v2/tools/mpi.jam中的注释

(1)   linux编译boost mpi出错

./boost/python/detail/wrap_python.hpp:76:24: error: patchlevel.h: No such file or directory  

原因是缺少python-dev,下载python-dev并将头文件及库放入python相应路径。

 

(2)  mpi编译Boost生成so文件出错:faile gcc.link.dll

google后说:

原因是gcc和ld不匹配,导致生成so时ld失败

解决方法是:找到user-config.jam,编辑configure gcc参数为 Using gcc : : : <linker-type>sun;重新编译

但貌似不可以.

错误地方:

    "g++"    -o "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/libboost_mpi.so.1.42.0" -Wl,-h -Wl,libboost_mpi.so.1.42.0 -shared -Wl,--start-group "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/broadcast.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/communicator.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/computation_tree.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/content_oarchive.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/environment.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/exception.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/graph_communicator.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/group.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/intercommunicator.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/mpi_datatype_cache.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/mpi_datatype_oarchive.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/packed_iarchive.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/packed_oarchive.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/packed_skeleton_iarchive.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/packed_skeleton_oarchive.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/point_to_point.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/request.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/text_skeleton_oarchive.o" "bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/timer.o" "bin.v2/libs/serialization/build/gcc-4.3/release/threading-multi/libboost_serialization.so.1.42.0"  -Wl,-Bstatic  -Wl,-Bdynamic -ldl -llam -llammpi++ -llammpio -lmpi -lrt -Wl,--end-group -Wl,--strip-all -pthread 

...failed gcc.link.dll bin.v2/libs/mpi/build/gcc-4.3/release/threading-multi/libboost_mpi.so.1.42.0...



自己把那块链接错误重新运行了一次,可以发现一些线索



(3)  cannotopen shared object file:No such file or directory

解决方法参考http://blog.csdn.net/nohackccc/article/details/9012813

(4)   

1.1.3      Cuda5

(1)  Errorwhile loading shared libraries:libcudart.so.3:wrong ELE class:ELFCLASS32

在编译cuda第一个程序时出现该错误,原因是LD_LIBRARY_PATH变量设置的32位路径,而不是64位,解决方法参考http://blog.csdn.net/nohackccc/article/details/9038537

 

1.1.4      InfiniBand技术

(1)How to installation of Mellanox Infiniband OFED stack on linux

参考http://blog.csdn.net/nohackccc/article/details/8990385

(2)  configure

参考http://blog.csdn.net/nohackccc/article/details/9005094

1.1.5      安装jdk及eclipse

http://blog.csdn.net/nohackccc/article/details/9012159

1.1.6      WindowANSI编码格式程序到linux下乱码问题

转到linux下要以UTF-8无Bom格式编码,解决方法一使用iconv脚本转换参考http://blog.csdn.net/nohackccc/article/details/9012971

或使用转换工具,见附件

1.1.7      Window代码到suse linux后在运行时因为memset后给结构体中string赋值出错,http://blog.csdn.net/nohackccc/article/details/9016583,原因待查。解决方法是把memset(&struct,0, sizeof(struct))这一行屏蔽,即可给struct.string赋值