azslurm scale fails with ModuleNotFoundError: No module named 'slurmcc' and NFS not mounted on scheduler VM

Harish Gudla 25 Reputation points
2025-11-12T12:01:25.5766667+00:00

I’m trying to configure/use autoscaling on an Azure HPC Slurm cluster, but on the scheduler VM azslurm scale fails with a missing Python module, and it also looks like the expected NFS share(s) are not mounted.

Environment:

  • Node: compular-scheduler (Slurm scheduler VM on Azure)

Running as root via sudo -i

Disk and mounts look like this:

root@compular-scheduler:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        62G   36G   27G  58% /
tmpfs           7.9G     0  7.9G   0% /dev/shm
tmpfs           3.2G  1.1M  3.2G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda15      105M  6.1M   99M   6% /boot/efi
tmpfs           1.6G  4.0K  1.6G   1% /run/user/1000

I was expecting to see one or more NFS-mounted filesystems for the shared storage (e.g. /shared, /apps, or similar), but they do not appear in df -h on the scheduler.

When I run azslurm scale I get: <private data>

The azslurm script points to a virtual environment under /opt/azurehpc/slurm/venv and imports slurmcc:

$ head /opt/azurehpc/slurm/venv/bin/azslurm
#!/opt/azurehpc/slurm/venv/bin/python

import os

if "SCALELIB_LOG_USER" not in os.environ:
    os.environ["SCALELIB_LOG_USER"] = "slurm"
if "SCALELIB_LOG_GROUP" not in os.environ:
    os.environ["SCALELIB_LOG_GROUP"] = "slurm"

from slurmcc.cli import main

Python in that venv is:

$ which python
/opt/azurehpc/slurm/venv/bin/python

But slurmcc does not seem to be installed there:

$ python -m pip list | grep -i slurm
# (no output)

$ python -m pip show slurmcc
WARNING: Package(s) not found: slurmcc

So it looks like the azslurm CLI is present, but the underlying slurmcc package is missing from /opt/azurehpc/slurm/venv. At the same time, the scheduler VM does not show any NFS-mounted shared storage in df -h, which might indicate the Slurm/Azure integration or provisioning did not complete correctly.

My questions:

  1. What is the correct way to (re)install or repair the slurmcc package and azslurm environment on an Azure HPC Slurm scheduler VM?
  2. Is there an official script/extension or documented procedure to re-run the Azure Slurm connector / autoscaling installation on an existing scheduler without breaking the cluster?
  3. Should the scheduler normally have NFS-mounted shared storage visible in df -h (e.g. /shared, /apps, or similar)? If yes, what is the recommended way to verify and/or re-mount the expected NFS shares on the scheduler node (As I have important data on the nfs disk)?

Any guidance on restoring a working azslurm scale command and ensuring the scheduler’s NFS mounts are correctly configured would be appreciated.I’m trying to configure/use autoscaling on an Azure HPC Slurm cluster, but on the scheduler VM azslurm scale fails with a missing Python module, and it also looks like the expected NFS share(s) are not mounted.

Azure CycleCloud
Azure CycleCloud
A Microsoft tool for creating, managing, operating, and optimizing high-performance computing (HPC) and big compute clusters in Azure.
{count} votes

Answer accepted by question author
  1. Himanshu Shekhar 1,860 Reputation points Microsoft External Staff Moderator
    2025-11-14T13:00:29.7033333+00:00

    User was able to resolve the slurmcc and azslurm issues on the scheduler node by downloading the cyclecloud-slurm Python package from the Azure GitHub repository (https://github.com/Azure/cyclecloud-slurm) and manually installing the slurmcc module into the azslurm venv. After reinstalling, the azslurm CLI is now functioning normally.

    Regarding the missing NFS-mounted shared storage, both /shared and /sched were not automatically mounted on restarting the VM. I manually remounted the volumes and the scheduler is now operating as expected.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Q&A Assist
    2025-11-12T12:02:04.3033333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    To address the issues you're facing with the azslurm scale command and the missing NFS mounts on your Azure HPC Slurm scheduler VM, here are some steps and considerations:

    1. Reinstall or Repair the slurmcc Package:
      • Since the slurmcc package is missing from your virtual environment, you can try reinstalling it. First, ensure that you have the necessary permissions and access to the package repository. You can typically install it using pip:
             source /opt/azurehpc/slurm/venv/bin/activate
             pip install slurmcc
        
      • If slurmcc is part of a larger package or suite, you may need to refer to the specific installation instructions for Azure HPC Slurm to ensure all dependencies are met.
    2. Re-run Azure Slurm Connector / Autoscaling Installation:
      • There isn't a specific documented procedure mentioned in the context for re-running the Azure Slurm connector or autoscaling installation without breaking the cluster. However, you can check the Azure documentation or support resources for guidance on reinstalling or repairing the Slurm setup. If you have the original installation scripts or commands, running them again may help.
    3. NFS-Mounted Shared Storage:
      • Yes, the scheduler should normally have NFS-mounted shared storage visible in df -h, such as /shared or /apps. If these mounts are missing, it indicates that the NFS shares may not have been set up correctly during the cluster configuration.
      • To verify and re-mount the expected NFS shares, you can check the NFS server configuration and ensure that the shares are correctly exported. You can also manually mount them using:
             mount -t nfs <nfs-server-ip>:/path/to/share /shared
        
      • Ensure that the NFS server is reachable and that the appropriate firewall rules are in place to allow NFS traffic.

    By following these steps, you should be able to restore functionality to the azslurm scale command and ensure that the NFS mounts are correctly configured on your scheduler node.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.