Rockstar runs successfully, followed by a stream of connection attempt failures

Issue #4 new
Drew Jamieson created an issue

I'm running rockstar with the following PBS script:

#!/bin/bash
#PBS -l nodes=8:ppn=28,walltime=08:00:00
#PBS -q long

rockstar -c rs.cfg &
mpiexec rockstar -c auto-rockstar.cfg

exit 0

Rockstar runs successfully, generating the list of halos and their properties for each simulation snapshot. However, it continues to run and outputs hundreds of connection attempt failure messages:

[Warning] Connection attempt 6 to sn153:34383 failed: : Connection refused [Warning] Connection attempt 7 to sn153:34383 failed: : Connection refused [Warning] Connection attempt 5 to sn153:34383 failed: : Connection refused [Warning] Connection attempt 7 to sn153:34383 failed: : Connection refused [Warning] Connection attempt 7 to sn153:34383 failed: : Connection refused [Warning] Connection attempt 8 to sn153:34383 failed: : Connection refused [Warning] Connection attempt 6 to sn153:34383 failed: : Connection refused [Warning] Connection attempt 6 to sn153:34383 failed: : Connection refused [Warning] Connection attempt 9 to sn153:34383 failed: : Connection refused [Error] Failed to connect to sn153:34383! (Err: Connection refused; This error may mean that the connection was refused.) [Warning] Connection attempt 6 to sn153:34383 failed: : Connection refused [Warning] Connection attempt 6 to sn153:34383 failed: : Connection refused [Warning] Connection attempt 9 to sn153:34383 failed: : Connection refused ...

It continues issuing these warnings and error messages for a long time. Am I running Rockstar incorrectly, or it this a bug?

Comments (3)

  1. HyeongHan Kim

    I’m having the same problem. If it’s not too late, could someone provide a solution for it?

  2. Drew Jamieson reporter

    Hi,

    I solved this issue for myself a while ago. It turns out I was running rockstar incorrectly.

    Are you trying to run on multiple nodes and multiple cores, or just on one node with multiple cores? If is it the latter, I would run with

    rockstar -c rs.cfg &
    sleep 1
    rockstar -c OUTBASE/auto-rockstar.cfg &
    

    The first step launches a server process that writes the auto-rockstar.cfg file in the OUTBASE directory, and then waits for other client processes to connect. It must be run in the background. I find the sleep 1 command avoids issues with the second processes starting too quickly, before auto-rockstar.cfg is written. Make sure you fill in the OUTBASE path to match the one in the rs.cfg file.

    If you want to run on multiple nodes and cores, you will need to change the third command, which launches the client processes. You will need to launch exactly one process per node. Rockstar will then internally fork new processes on each node to run in parallel, depending on how you set up your rs.cfg file. You can do this with an mpirun command, such as,

    rockstar -c rs.cfg &
    sleep 1
    mpirun -np num_nodes -ppn 1 rockstar -c [OUTBASE]/auto-rockstar.cfg &
    

    Here num_nodes is the number of nodes you want to run on and -ppn 1sets the processes per node to 1. I tested this out on the cluster I use and it seems to work, although I usually just run on one node with 24 cores and it takes about 5 minutes for a single simulation snapshot with 1024^3 particles.

  3. HyeongHan Kim

    Hi, thanks a lot for the detailed explanation. I’ve tried both but failed to run successfully. I suspect the privacy setting on mac is causing a problem. If I resolve it, I’ll leave some comment for someone else having the same problem.

  4. Log in to comment