Bug #5026

intra-node/inter-zone networking doesn't always deliver SIGPOLL

Added by Dan McDonald over 3 years ago. Updated over 3 years ago.

Status:ClosedStart date:2014-07-18
Priority:NormalDue date:
Assignee:Dan McDonald% Done:

100%

Category:networking
Target version:-
Difficulty:Hard Tags:

Description

NOTE: This was seen with OmniOS r151006, which is about a year old. I will update this if I can reproduce it on more modern Illumos code.

Consider a global-zone process using SIGPOLL handling to detect IO on a network socket. ntpd is a good example.

Consider a sending process on a same-machine non-global zone. It is possible, at the very least where both global and non-global zones have vnics over the same aggregation, that the SIGPOLL is never delivered to the process in the global (receiving) zone.

Some verbose annotated DTrace output:

              ip`ire_recv_local_v4+0x132
              ip`ill_input_short_v4+0x4d6
              ip`ip_input_common_v4+0x372
              ip`ip_input+0x2b
              dls`i_dls_link_rx+0x1cd
              mac`mac_rx_deliver+0x37
              mac`mac_rx_soft_ring_process+0x19a
              mac`mac_rx_srs_proto_fanout+0x29a
              mac`mac_rx_srs_drain+0x363
              mac`mac_rx_srs_process+0x3ce
/* RIGHT HERE we leave a non-global zone and enter a global zone! */
              mac`mac_tx_send+0x431
              mac`mac_tx_soft_ring_process+0x79
              mac`mac_tx_aggr_mode+0x7c
              mac`mac_tx+0xda
              dld`str_mdata_fastpath_put+0x53
              ip`ip_xmit+0x94f
              ip`ire_send_wire_v4+0x3e9
              ip`conn_ip_output+0x190
              ip`udp_output_connected+0x139
              ip`udp_send+0x7b2

  4   | ip_input_local_v4:entry               
  4    -> ip_csum_hdr                         
  4    <- ip_csum_hdr                         Returns 0x0
  4    -> ip_fanout_v4                        
  4      -> ip_input_cksum_v4                 
  4        -> ip_input_sw_cksum_v4            
  4          -> ip_input_cksum_pseudo_v4      
  4          <- ip_input_cksum_pseudo_v4      Returns 0x145d8
  4          -> ip_cksum                      
  4            -> ip_ocsum                    
  4            <- ip_ocsum                    Returns 0xffff
  4          <- ip_cksum                      Returns 0xffff
  4        <- ip_input_sw_cksum_v4            Returns 0x1
  4      <- ip_input_cksum_v4                 Returns 0x1
  4      -> ipcl_classify_v4                  
  4      <- ipcl_classify_v4                  Returns 0xffffff32cde34540
  4      -> udp_input                         
  4        -> ip_find_hdr_v4                  
  4        <- ip_find_hdr_v4                  Returns 0x0
  4        -> conn_recvancillary_size         
  4        <- conn_recvancillary_size         Returns 0x28
  4        -> allocb                          
  4          -> kmem_cache_alloc              
  4          <- kmem_cache_alloc              Returns 0xffffff7530dd1440
  4        <- allocb                          Returns 0xffffffb7db752280
  4        -> conn_recvancillary_add          
  4          -> gethrestime                   
  4            -> pc_gethrestime              
  4              -> gethrtime                 
  4                -> tsc_gethrtime           
  4                <- tsc_gethrtime           Returns 0x6f4ee913f76957
  4              <- gethrtime                 Returns 0x6f4ee913f76957
  4            <- pc_gethrestime              Returns 0x75609f00
  4          <- gethrestime                   Returns 0x75609f00
  4        <- conn_recvancillary_add          Returns 0x28
  4        -> udp_ulp_recv                    
  4          -> so_queue_msg                  
  4            -> so_queue_msg_impl           
/* Message is enqueed, but ... */
  4              -> so_enqueue_msg            
  4              <- so_enqueue_msg            Returns 0xffffffdef28b2060
  4              -> so_notify_data            
  4                -> socket_sendsig          
  4                  -> prfind                
  4                    -> getzoneid           
  4                    <- getzoneid           Returns 0x14
/* This function... */
  4                    -> prfind_zone         
  4                      -> pid_lookup        
  4                      <- pid_lookup        Returns 0xffffff32a5c98420
  4                    <- prfind_zone         Returns 0x0
/* FAILED!!!   Because the receiver's pid's zone isn't the sender's. */
  4                  <- prfind                Returns 0x0
  4                <- socket_sendsig          Returns 0x0
  4              <- so_notify_data            Returns 0x0
  4            <- so_queue_msg_impl           Returns 0xdfb8
  4          <- so_queue_msg                  Returns 0xdfb8
  4        <- udp_ulp_recv                    Returns 0xdfb8
  4      <- udp_input                         Returns 0xdfb8
  4      -> cv_broadcast                      
  4      <- cv_broadcast                      Returns 0x0
  4    <- ip_fanout_v4                        Returns 0x0
  4  <- ip_input_local_v4                     Returns 0x0

recvsig.c Magnifier (960 Bytes) Dan McDonald, 2014-07-18 08:24 PM

History

#1 Updated by Robert Mustacchi over 3 years ago

  • Tags deleted (needs-triage)
  • Assignee set to Dan McDonald

In this case, we have a global zone process/process group that's been set up to receive signals on data. Here, we have a zone on that same machine sending data and it's thread gets co-opted in MAC to turn around and do some data processing for a MAC device that ends up sending packets up to another netstack that isn't the one we're currently in. In general this pattern should be fine; however, the problem is how socket_sendsig tries to look up the proc_t to send something to. Here it's using prfind. prfind implicitly treats the lookup as running in the zone of curproc. And that's exactly the bug.

socket_sendsig cannot operate under the assumption that it's curthread is the correct context to use for authorization. While socket upcalls are processed in the context of a netstack there's no indication that the current thread that is doing the processing in the kernel is in fact, in that zone/netstack.

The sonode actually has a notion of what zone it is in -- the so_zoneid. If we look at all assignments to the so_pgrp, we'll see that we validate that it's assigned in a context where we can actually make this assignment. Therefore, instead of using prfind, we should use prfind_zone and use the zone id in so_zoneid. In general it might actually be okay to use ALL_ZONES, but given that so_zoneid should always properly scope this, even if a global zone socket has ended up in a non-global zone, we should be good to go.

#2 Updated by Dan McDonald over 3 years ago

  • Difficulty changed from Expert to Hard
  • % Done changed from 0 to 50
  • Status changed from New to In Progress

Reproducing this bug is less complicated than I'd first imagined.

Steps to reproduce:

1.) Create two vnics. Give both a VLAN ID (using -v in "dladm create-vnic").

2.) Assign one to the global zone, the other to a non-global zone.

3.) Run the above test program in the global zone.

4.) Use netcat or something else to send a packet from the non-global-zone to the listener in the global zone.

The global zone process will not see a signal for packets sent from the non-global zone. Use an off-link peer as well sending different datagram sizes, and the program will show how it will only read the non-global-zone packets after it's been interrupted by off-link traffic.

#3 Updated by Dan McDonald over 3 years ago

Using Robert's analysis, the fix is straightforward. Using my reproduction, the testing and verification is also straightforward.

diff --git a/usr/src/uts/common/fs/sockfs/sockcommon_subr.c b/usr/src/uts/common
index 00ab44e..957c8f9 100644
--- a/usr/src/uts/common/fs/sockfs/sockcommon_subr.c
+++ b/usr/src/uts/common/fs/sockfs/sockcommon_subr.c
@@ -22,6 +22,9 @@
 /*
  * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
  */
+/*
+ * Copyright 2014, OmniTI Computer Consulting, Inc. All rights reserved.
+ */

 #include <sys/types.h>
 #include <sys/param.h>
@@ -396,7 +399,12 @@ socket_sendsig(struct sonode *so, int event)
                 * the proc is active.
                 */
                mutex_enter(&pidlock);
-               proc = prfind(so->so_pgrp);
+               /*
+                * Even if the thread started in another zone, we're receiving
+                * on behalf of this socket's zone, so find the proc using the
+                * socket's zone ID.
+                */
+               proc = prfind_zone(so->so_pgrp, so->so_zoneid);
                if (proc == NULL) {
                        mutex_exit(&pidlock);
                        return;
@@ -413,7 +421,12 @@ socket_sendsig(struct sonode *so, int event)
                pid_t pgrp = -so->so_pgrp;

                mutex_enter(&pidlock);
-               proc = pgfind(pgrp);
+               /*
+                * Even if the thread started in another zone, we're receiving
+                * on behalf of this socket's zone, so find the pgrp using the
+                * socket's zone ID.
+                */
+               proc = pgfind_zone(pgrp, so->so_zoneid);
                while (proc != NULL) {
                        mutex_enter(&proc->p_lock);
                        socket_sigproc(proc, event);

#4 Updated by Electric Monk over 3 years ago

  • % Done changed from 50 to 100
  • Status changed from In Progress to Closed

git commit d690b62cf13675007342e6bb43d4816ec57a46dd

commit  d690b62cf13675007342e6bb43d4816ec57a46dd
Author: Dan McDonald <danmcd@omniti.com>
Date:   2014-07-24T19:04:32.000Z

    5026 intra-node/inter-zone networking doesn't always deliver SIGPOLL
    Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
    Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Approved by: Garrett D'Amore <garrett@damore.org>

Also available in: Atom