0. 简介

传统的并发编程模型是基于线程共享内存的同步访问控制的,共享数据受锁的保护,线程将争夺这些锁以访问数据。通常而言,使用线程安全的数据结构会使得这更加容易。Go的并发原语(goroutinechannel)提供了一种优雅的方式来构建并发模型。Go鼓励在goroutine之间使用channel来传递数据,而不是显式地使用锁来限制对共享数据的访问。

Do not communicate by sharing memory; instead, share memory by communicating.

这就是Go的并发哲学,它依赖CSP(Communicating Sequential Processes)模型,它经常被认为是Go在并发编程上成功的关键因素。

如果说goroutineGo语言程序的并发体的话,那么channel就是他们之间的通信机制,前面的系列博客对goroutine及其调度机制进行了介绍,本文将介绍一下二者之间的通信机制——channel

1. channel数据结构

type hchan struct {
   qcount   uint           // total data in the queue
   dataqsiz uint           // size of the circular queue
   buf      unsafe.Pointer // points to an array of dataqsiz elements
   elemsize uint16
   closed   uint32
   elemtype *_type // element type
   sendx    uint   // send index
   recvx    uint   // receive index
   recvq    waitq  // list of recv waiters
   sendq    waitq  // list of send waiters

   // lock protects all fields in hchan, as well as several
   // fields in sudogs blocked on this channel.
   //
   // Do not change another G's status while holding this lock
   // (in particular, do not ready a G), as this can deadlock
   // with stack shrinking.
   lock mutex
}

runtime/chan.go中,channel被定义如上,其中:

  • buf:是有缓存的channel持有的,用来存储缓存数据,收个循环链表;
  • dataqsiz:上述缓存数据的循环链表的最大容量,理解为cap()
  • qcount:上述缓存数据的循环链表的长度,理解为len()
  • recvxsendx:表示上述缓存的接收或者发送位置;
  • recvqsendq:分别是接收和发送的goroutine抽象(sudog)队列,是个双向链表;
  • lock:互斥锁,用来保证channel数据的线程安全。

2. channel创建

func makechan64(t *chantype, size int64) *hchan {
   if int64(int(size)) != size {
      panic(plainError("makechan: size out of range"))
   }

   return makechan(t, int(size))
}

func makechan(t *chantype, size int) *hchan {
   elem := t.elem

   // compiler checks this but be safe.
   if elem.size >= 1<<16 {
      throw("makechan: invalid channel element type")
   }
   if hchanSize%maxAlign != 0 || elem.align > maxAlign {
      throw("makechan: bad alignment")
   }

   mem, overflow := math.MulUintptr(elem.size, uintptr(size))
   if overflow || mem > maxAlloc-hchanSize || size < 0 {
      panic(plainError("makechan: size out of range"))
   }

   // Hchan does not contain pointers interesting for GC when elements stored in buf do not contain pointers.
   // buf points into the same allocation, elemtype is persistent.
   // SudoG's are referenced from their owning thread so they can't be collected.
   // TODO(dvyukov,rlh): Rethink when collector can move allocated objects.
   var c *hchan
   switch {
   case mem == 0:
      // Queue or element size is zero.
      c = (*hchan)(mallocgc(hchanSize, nil, true))
      // Race detector uses this location for synchronization.
      c.buf = c.raceaddr()
   case elem.ptrdata == 0:
      // Elements do not contain pointers.
      // Allocate hchan and buf in one call.
      c = (*hchan)(mallocgc(hchanSize+mem, nil, true))
      c.buf = add(unsafe.Pointer(c), hchanSize)
   default:
      // Elements contain pointers.
      c = new(hchan)
      c.buf = mallocgc(mem, elem, true)
   }

   c.elemsize = uint16(elem.size)
   c.elemtype = elem
   c.dataqsiz = uint(size)
   lockInit(&c.lock, lockRankHchan)

   if debugChan {
      print("makechan: chan=", c, "; elemsize=", elem.size, "; dataqsiz=", size, "\n")
   }
   return c
}

所有的调用最后都会走到runtime.makechan函数,函数做的事情比较简单,就是初始化一个runtime.hchan的对象,和map一样,channel对外就是一个指针对象(切片和字符串则不是指针对象,以切片为例,可以参考链接)。可以看到:

  • 如果当前channel没有缓存,那么就只会runtime.hchan分配一段空间;
  • 如果当前channel中存储的类型不是指针类型,那么会为当前的runtime.hchan和底层的连续数组分配一块连续的内存空间;
  • 其他情况下,那么则为runtime.hchan和其缓存各自分配一段内存;

3. 数据发送

// entry point for c <- x from compiled code
//go:nosplit
func chansend1(c *hchan, elem unsafe.Pointer) {
   chansend(c, elem, true, getcallerpc())
}

channel的数据发送会调用runtime.chansend1函数,而该函数则只是调用了runtime.chansend函数,该函数比较长,我们一点一点分析:

3.1 空通道的数据发送

func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
   if c == nil {
      if !block {
         return false
      }
      gopark(nil, nil, waitReasonChanSendNilChan, traceEvGoStop, 2)
      throw("unreachable")
   }

   ...
}

可以看到,如果通道是nil,那么往这个通道中写数据时:

  • 非阻塞写会直接返回(在单channel发送+default分支的select操作时会调用runtime.selectnbsend函数,从而会非阻塞写);
  • 阻塞写(正常的ch <- v)时则会通过gopark函数让出CPU调度权,阻塞此goroutine

3.2 直接发送

func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
   ...

   if c.closed != 0 {
      unlock(&c.lock)
      panic(plainError("send on closed channel"))
   }

   if sg := c.recvq.dequeue(); sg != nil {
      // Found a waiting receiver. We pass the value we want to send
      // directly to the receiver, bypassing the channel buffer (if any).
      send(c, sg, ep, func() { unlock(&c.lock) }, 3)
      return true
   }

   ...
}

可以发现,当channel被关闭后再发送数据,那么会导致panic

如果目标channel没有关闭,且有已经处于读等待的goroutine,那么会直接从recvq中取出最先陷入等待的goroutine,并通过runtime.send函数向其发送数据:

func send(c *hchan, sg *sudog, ep unsafe.Pointer, unlockf func(), skip int) {
   if raceenabled {
      if c.dataqsiz == 0 {
         racesync(c, sg)
      } else {
         // Pretend we go through the buffer, even though
         // we copy directly. Note that we need to increment
         // the head/tail locations only when raceenabled.
         racenotify(c, c.recvx, nil)
         racenotify(c, c.recvx, sg)
         c.recvx++
         if c.recvx == c.dataqsiz {
            c.recvx = 0
         }
         c.sendx = c.recvx // c.sendx = (c.sendx+1) % c.dataqsiz
      }
   }
   if sg.elem != nil {
      sendDirect(c.elemtype, sg, ep)
      sg.elem = nil
   }
   gp := sg.g
   unlockf()
   gp.param = unsafe.Pointer(sg)
   sg.success = true
   if sg.releasetime != 0 {
      sg.releasetime = cputicks()
   }
   goready(gp, skip+1)
}

可以看到,以上函数做了两件事:

  • 调用sendDirect函数将发送的数据拷贝到接收协程的变量所在的地址上;
  • 通过goready函数唤醒协程,将其状态置为_Grunnable后放置到处理器的队列的下一个待处理goroutine

3.3 缓存区

如果没有已经处于读等待的goroutine,且创建的channel包含缓存,并且缓存还没有满,那么会执行以下代码:

func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
   ...
   if c.qcount < c.dataqsiz {
      // Space is available in the channel buffer. Enqueue the element to send.
      qp := chanbuf(c, c.sendx)
      if raceenabled {
         racenotify(c, c.sendx, nil)
      }
      typedmemmove(c.elemtype, qp, ep)
      c.sendx++
      if c.sendx == c.dataqsiz {
         c.sendx = 0
      }
      c.qcount++
      unlock(&c.lock)
      return true
   }
   ...
}

在这里会首先通过runtime.chanbuf函数计算出下一个可以存储的位置,然后通过runtime.typedmemmove将发送的数据拷贝到缓冲区中并增加sendx索引和qcount计数器。等待有接收数据的goroutine时可以直接从缓存中读取。

3.4 阻塞发送

如果既没有等待读的goroutine,又没有缓存区或着缓存区满了,那么就会阻塞发送数据:

func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
   ...

   if !block {
      unlock(&c.lock)
      return false
   }

   // Block on the channel. Some receiver will complete our operation for us.
   gp := getg()
   mysg := acquireSudog()
   mysg.releasetime = 0
   if t0 != 0 {
      mysg.releasetime = -1
   }
   // No stack splits between assigning elem and enqueuing mysg
   // on gp.waiting where copystack can find it.
   mysg.elem = ep
   mysg.waitlink = nil
   mysg.g = gp
   mysg.isSelect = false
   mysg.c = c
   gp.waiting = mysg
   gp.param = nil
   c.sendq.enqueue(mysg)
   // Signal to anyone trying to shrink our stack that we're about
   // to park on a channel. The window between when this G's status
   // changes and when we set gp.activeStackChans is not safe for
   // stack shrinking.
   atomic.Store8(&gp.parkingOnChan, 1)
   gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanSend, traceEvGoBlockSend, 2)
   // Ensure the value being sent is kept alive until the
   // receiver copies it out. The sudog has a pointer to the
   // stack object, but sudogs aren't considered as roots of the
   // stack tracer.
   KeepAlive(ep)

   // someone woke us up.
   if mysg != gp.waiting {
      throw("G waiting list is corrupted")
   }
   gp.waiting = nil
   gp.activeStackChans = false
   closed := !mysg.success
   gp.param = nil
   if mysg.releasetime > 0 {
      blockevent(mysg.releasetime-t0, 2)
   }
   mysg.c = nil
   releaseSudog(mysg)
   if closed {
      if c.closed == 0 {
         throw("chansend: spurious wakeup")
      }
      panic(plainError("send on closed channel"))
   }
   return true
}
  • 调用runtime.getg获取此时发送数据的goroutine
  • 调用runtime.acquireSudog获取sudog结构并设置相关信息;
  • 将上一步获取的sudog放到发送等待队列,并且调用gopark挂起当前协程;
  • 等待有接收数据的goroutine到来后,即唤醒此goroutine,然后继续往下走;或者close了此channel,导致后续的panic

4. 接收数据

4.1 空通道的数据接收

func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
   // raceenabled: don't need to check ep, as it is always on the stack
   // or is new memory allocated by reflect.

   if debugChan {
      print("chanrecv: chan=", c, "\n")
   }

   if c == nil {
      if !block {
         return
      }
      gopark(nil, nil, waitReasonChanReceiveNilChan, traceEvGoStop, 2)
      throw("unreachable")
   }

   ...

   lock(&c.lock)

   if c.closed != 0 && c.qcount == 0 {
      if raceenabled {
         raceacquire(c.raceaddr())
      }
      unlock(&c.lock)
      if ep != nil {
         typedmemclr(c.elemtype, ep)
      }
      return true, false
   }

   ...
}

以上是通道接收时的一部分代码,可以看到:

  • 和发送数据一样,如果通道是nil,且非阻塞读,则会返回,阻塞读后则会挂起;
  • 和发送数据时不一样的是,如果是一个已经关闭的通道,其实是可读的,但是读回的数据都是零值+false

4.2 直接接收

func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
   ...

   if sg := c.sendq.dequeue(); sg != nil {
      // Found a waiting sender. If buffer is size 0, receive value
      // directly from sender. Otherwise, receive from head of queue
      // and add sender's value to the tail of the queue (both map to
      // the same buffer slot because the queue is full).
      recv(c, sg, ep, func() { unlock(&c.lock) }, 3)
      return true, true
   }

   ...
}

channelsendq队列中包含处于等待状态的goroutine时,会取出等待的最早的写数据goroutine,然后调用runtime.recv进行发送:

func recv(c *hchan, sg *sudog, ep unsafe.Pointer, unlockf func(), skip int) {
   if c.dataqsiz == 0 {
      if raceenabled {
         racesync(c, sg)
      }
      if ep != nil {
         // copy data from sender
         recvDirect(c.elemtype, sg, ep)
      }
   } else {
      // Queue is full. Take the item at the
      // head of the queue. Make the sender enqueue
      // its item at the tail of the queue. Since the
      // queue is full, those are both the same slot.
      qp := chanbuf(c, c.recvx)
      if raceenabled {
         racenotify(c, c.recvx, nil)
         racenotify(c, c.recvx, sg)
      }
      // copy data from queue to receiver
      if ep != nil {
         typedmemmove(c.elemtype, ep, qp)
      }
      // copy data from sender to queue
      typedmemmove(c.elemtype, qp, sg.elem)
      c.recvx++
      if c.recvx == c.dataqsiz {
         c.recvx = 0
      }
      c.sendx = c.recvx // c.sendx = (c.sendx+1) % c.dataqsiz
   }
   sg.elem = nil
   gp := sg.g
   unlockf()
   gp.param = unsafe.Pointer(sg)
   sg.success = true
   if sg.releasetime != 0 {
      sg.releasetime = cputicks()
   }
   goready(gp, skip+1)
}

该函数会根据是否存在缓存区分别处理:

  • 如果不存在缓存区,则调用runtime.recvDirect函数直接将发送goroutine存储的数据拷贝到目标内存地址中,相当于直接从这个goroutine中取数据;
  • 如果存在缓存区,那么先将缓存区中的数据拷贝到目标内存地址中,然后将gp的数据拷贝到缓存区最后,相当于先从缓存队列头部取出数据给接收goroutine,在从等待发送goroutine中取出数据到缓存队列尾部,可以看出,此时队列一定是满的。

最后无论哪种情况,都需要调用goready唤醒gp

4.3 从缓存区拿

其实这里的章节名描述并不准确,在4.2中也存在从缓存区拿数据的情况,差别在于:

  • 4.2中缓存队列是满的,且还有发送阻塞等到的goroutine
  • 4.3中不存在发送阻塞等到的goroutine
func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
   ...

   if c.qcount > 0 {
      // Receive directly from queue
      qp := chanbuf(c, c.recvx)
      if raceenabled {
         racenotify(c, c.recvx, nil)
      }
      if ep != nil {
         typedmemmove(c.elemtype, ep, qp)
      }
      typedmemclr(c.elemtype, qp)
      c.recvx++
      if c.recvx == c.dataqsiz {
         c.recvx = 0
      }
      c.qcount--
      unlock(&c.lock)
      return true, true
   }

   ...
}

和发送时一样,如果缓存区有数据,那么从缓存区拷贝数据。

4.4 阻塞接收

func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
   ...

   if !block {
      unlock(&c.lock)
      return false, false
   }

   // no sender available: block on this channel.
   gp := getg()
   mysg := acquireSudog()
   mysg.releasetime = 0
   if t0 != 0 {
      mysg.releasetime = -1
   }
   // No stack splits between assigning elem and enqueuing mysg
   // on gp.waiting where copystack can find it.
   mysg.elem = ep
   mysg.waitlink = nil
   gp.waiting = mysg
   mysg.g = gp
   mysg.isSelect = false
   mysg.c = c
   gp.param = nil
   c.recvq.enqueue(mysg)
   // Signal to anyone trying to shrink our stack that we're about
   // to park on a channel. The window between when this G's status
   // changes and when we set gp.activeStackChans is not safe for
   // stack shrinking.
   atomic.Store8(&gp.parkingOnChan, 1)
   gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanReceive, traceEvGoBlockRecv, 2)

   // someone woke us up
   if mysg != gp.waiting {
      throw("G waiting list is corrupted")
   }
   gp.waiting = nil
   gp.activeStackChans = false
   if mysg.releasetime > 0 {
      blockevent(mysg.releasetime-t0, 2)
   }
   success := mysg.success
   gp.param = nil
   mysg.c = nil
   releaseSudog(mysg)
   return true, success
}

和阻塞发送类似,如果没有等待发送的goroutine,且没有缓存区或者缓存区没有数据,那这个时候就需要将此接收goroutine压到recvq中,并且gopark挂起,等待唤醒。

5. 关闭

func closechan(c *hchan) {
   if c == nil {
      panic(plainError("close of nil channel"))
   }

   lock(&c.lock)
   if c.closed != 0 {
      unlock(&c.lock)
      panic(plainError("close of closed channel"))
   }

   if raceenabled {
      callerpc := getcallerpc()
      racewritepc(c.raceaddr(), callerpc, abi.FuncPCABIInternal(closechan))
      racerelease(c.raceaddr())
   }

   c.closed = 1

   var glist gList

   // release all readers
   for {
      sg := c.recvq.dequeue()
      if sg == nil {
         break
      }
      if sg.elem != nil {
         typedmemclr(c.elemtype, sg.elem)
         sg.elem = nil
      }
      if sg.releasetime != 0 {
         sg.releasetime = cputicks()
      }
      gp := sg.g
      gp.param = unsafe.Pointer(sg)
      sg.success = false
      if raceenabled {
         raceacquireg(gp, c.raceaddr())
      }
      glist.push(gp)
   }

   // release all writers (they will panic)
   for {
      sg := c.sendq.dequeue()
      if sg == nil {
         break
      }
      sg.elem = nil
      if sg.releasetime != 0 {
         sg.releasetime = cputicks()
      }
      gp := sg.g
      gp.param = unsafe.Pointer(sg)
      sg.success = false
      if raceenabled {
         raceacquireg(gp, c.raceaddr())
      }
      glist.push(gp)
   }
   unlock(&c.lock)

   // Ready all Gs now that we've dropped the channel lock.
   for !glist.empty() {
      gp := glist.pop()
      gp.schedlink = 0
      goready(gp, 3)
   }
}

关闭通道的代码看上去很长,实际上在处理完一些特殊情况后,就是对发送和接收队列的数据通通使用goready唤醒。

6. 总结

Go中,虽然极力推崇CSP哲学,推荐大家使用channel实现共享内存的保护,但是:

在幕后,通道使用锁来序列化访问并提供线程安全性。 因此,通过使用通道同步对内存的访问,你实际上就是在使用锁。 被包装在线程安全队列中的锁。 那么,与仅仅使用标准库 sync 包中的互斥量相比,Go 的花式锁又如何呢? 以下数字是通过使用 Go 的内置基准测试功能,对它们的单个集合连续调用 Put 得出的。

`> BenchmarkSimpleSet-8 3000000 391 ns/op`
`> BenchmarkSimpleChannelSet-8 1000000 1699 ns/o`

就我个人的理解而言:

  • 在进行数据的传输时使用channel
  • 在进行内存数据的保护时使用sync.Mutex
  • 利用channelselect的特性,实现类似于Linux epoll的功能。