This is a really quick post to share the resolution for an issue I encountered today with VCF 4.3.
I was deploying the management domain using Cloud Builder to deploy a new VCF 4.3 instance for a customer. Everything was going ok for the bring up of the management domain when it failed on the deployment of the SDDC Manager appliance after multiple attempts. In vCenter you could see the virtual appliance is deployed, reconfigured, and powered on, and then almost immediately after it has completed the boot process the VM was deleted. This process was repeated a few times before the task finally come up as failed in Cloud Builder with a message that the SDDC Manager VM was not ready.
Looking at the vcf-bringup-debug.log file on Cloud Builder I found an error “unable to create jsch cli session auth fail” and “unable to connect to ssh server@<sddc manager vm name>.
The errors seemed to suggest it did not like the credentials provided for the SDDC Manager VM. Since the VM is destroyed when the error occurs it was a race to try and ssh as the vcf account and log in. After managed to complete the test in time I discovered it did not allow me to log in.
After searching internally within VMware support and knowledge systems and via Google without success I did a search of our internal slack channels and found two other people reporting the same problem. Luckily for me there was a fix, and the root cause is that the simple password we had chosen was being marked as a dictionary word on the appliance OS and therefore wasn’t complex enough. Unfortunately, this is not picked up during the deployment parameters spreadsheet excel validation checks or once it is uploaded to cloud builder and the validation tasks performed.
There is a bug ticket logged for this to be resolved in an upcoming release so that the validation will alert you to any issues with the password.
The solution is to use a more complex password, and then restart the failed operation via an API call. If you are in any doubt about performing these steps contact VMware support for assistance.
Here are the steps I used to resolve it for my customer:
- Log in to Cloud Builder via SSH and browse to /tmp
- Locate the bringup json file which is named sddcspec-<random id>.json
- Edit the json file and update the passwords for the SDDC Manager to the new more complex value and save the changes
- Open the /opt/vmware/bringup/logs/vcf-bringup-debug.log and find the entry for the failed task. It will read “End of Orchestration with FAILURE for Execution ID <UUID>”
- Note the value of the UUID shown in the log which is the failed task id
- From the Cloud Builder appliance run the command “curl -k -u admin:'<cloud builder admin account password>’ -X PATCH https://localhost/v1/sddcs/<UUID> -H “Content-Type: application/json” -d “@</tmp/<JSON File Name>”
- Cloud Builder will print the json spec to the command line and inform you the task is in progress. If you refresh the Cloud Builder GUI you should see it retrying the failed task