3354 kernel crash in rpcsec_gss after using gsscred

Review Request #1174 - Created Aug. 25, 2018 and submitted

Andy Fiddaman

3354 kernel crash in rpcsec_gss after using gsscred

The summary is that rpcsec_gss makes an asynchronous call since "kgss_accept_sec_context()/gssd(1M) can be overly time consuming". The call is made via a taskq and passes a reference to the underlying rpcmod transport. There's a race where the transport can be closed before the task completes. The fix is to grab a reference to the transport before starting the asynchronous task and release it when complete - the reference counting mechanism is already in rpcmod, just was not exposed via the transport ops. There's a good analysis of the fault in the issue.

I wrote a fix and then Marcel Telka pointed me at his work-in-progress fix which took the same approach, but I incorporated some of his ideas too, hence the two copyright lines.

To make it easier to review, the cstyle fixes are in the second diff.

The updated rpcmod and rpcsec_gss modules have been running successfully on several busy OmniOS NFS Servers, including one that previously crashed with the reported bug every 1-2 weeks.

Andy Fiddaman
Andy Fiddaman
Toomas Soome
carlos neira
Andy Fiddaman
Review request changed

Status: Closed (submitted)