Golang并发编程之Channel详解

0. 简介

传统的并发编程模型是基于线程和共享内存的同步访问控制的，共享数据受锁的保护，线程将争夺这些锁以访问数据。通常而言，使用线程安全的数据结构会使得这更加容易。Go的并发原语（goroutine和channel）提供了一种优雅的方式来构建并发模型。Go鼓励在goroutine之间使用channel来传递数据，而不是显式地使用锁来限制对共享数据的访问。

Do not communicate by sharing memory; instead, share memory by communicating.

这就是Go的并发哲学，它依赖CSP(Communicating Sequential Processes)模型，它经常被认为是Go在并发编程上成功的关键因素。

如果说goroutine是Go语言程序的并发体的话，那么channel就是他们之间的通信机制，前面的系列博客对goroutine及其调度机制进行了介绍，本文将介绍一下二者之间的通信机制——channel。

1. channel数据结构

type hchan struct {
   qcount   uint           // total data in the queue
   dataqsiz uint           // size of the circular queue
   buf      unsafe.Pointer // points to an array of dataqsiz elements
   elemsize uint16
   closed   uint32
   elemtype *_type // element type
   sendx    uint   // send index
   recvx    uint   // receive index
   recvq    waitq  // list of recv waiters
   sendq    waitq  // list of send waiters

   // lock protects all fields in hchan, as well as several
   // fields in sudogs blocked on this channel.
   //
   // Do not change another G's status while holding this lock
   // (in particular, do not ready a G), as this can deadlock
   // with stack shrinking.
   lock mutex
}

在runtime/chan.go中，channel被定义如上，其中：

buf：是有缓存的channel持有的，用来存储缓存数据，收个循环链表；
dataqsiz：上述缓存数据的循环链表的最大容量，理解为cap()；
qcount：上述缓存数据的循环链表的长度，理解为len()；
recvx和sendx：表示上述缓存的接收或者发送位置；
recvq和sendq：分别是接收和发送的goroutine抽象（sudog）队列，是个双向链表；
lock：互斥锁，用来保证channel数据的线程安全。

2. channel创建

func makechan64(t *chantype, size int64) *hchan {
   if int64(int(size)) != size {
      panic(plainError("makechan: size out of range"))
   }

   return makechan(t, int(size))
}

func makechan(t *chantype, size int) *hchan {
   elem := t.elem

   // compiler checks this but be safe.
   if elem.size >= 1<<16 {
      throw("makechan: invalid channel element type")
   }
   if hchanSize%maxAlign != 0 || elem.align > maxAlign {
      throw("makechan: bad alignment")
   }

   mem, overflow := math.MulUintptr(elem.size, uintptr(size))
   if overflow || mem > maxAlloc-hchanSize || size < 0 {
      panic(plainError("makechan: size out of range"))
   }

   // Hchan does not contain pointers interesting for GC when elements stored in buf do not contain pointers.
   // buf points into the same allocation, elemtype is persistent.
   // SudoG's are referenced from their owning thread so they can't be collected.
   // TODO(dvyukov,rlh): Rethink when collector can move allocated objects.
   var c *hchan
   switch {
   case mem == 0:
      // Queue or element size is zero.
      c = (*hchan)(mallocgc(hchanSize, nil, true))
      // Race detector uses this location for synchronization.
      c.buf = c.raceaddr()
   case elem.ptrdata == 0:
      // Elements do not contain pointers.
      // Allocate hchan and buf in one call.
      c = (*hchan)(mallocgc(hchanSize+mem, nil, true))
      c.buf = add(unsafe.Pointer(c), hchanSize)
   default:
      // Elements contain pointers.
      c = new(hchan)
      c.buf = mallocgc(mem, elem, true)
   }

   c.elemsize = uint16(elem.size)
   c.elemtype = elem
   c.dataqsiz = uint(size)
   lockInit(&c.lock, lockRankHchan)

   if debugChan {
      print("makechan: chan=", c, "; elemsize=", elem.size, "; dataqsiz=", size, "\n")
   }
   return c
}

所有的调用最后都会走到runtime.makechan函数，函数做的事情比较简单，就是初始化一个runtime.hchan的对象，和map一样，channel对外就是一个指针对象（切片和字符串则不是指针对象，以切片为例，可以参考链接）。可以看到：

如果当前channel没有缓存，那么就只会runtime.hchan分配一段空间；
如果当前channel中存储的类型不是指针类型，那么会为当前的runtime.hchan和底层的连续数组分配一块连续的内存空间；
其他情况下，那么则为runtime.hchan和其缓存各自分配一段内存；

3. 数据发送

// entry point for c <- x from compiled code
//go:nosplit
func chansend1(c *hchan, elem unsafe.Pointer) {
   chansend(c, elem, true, getcallerpc())
}

channel的数据发送会调用runtime.chansend1函数，而该函数则只是调用了runtime.chansend函数，该函数比较长，我们一点一点分析：

3.1 空通道的数据发送

func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
   if c == nil {
      if !block {
         return false
      }
      gopark(nil, nil, waitReasonChanSendNilChan, traceEvGoStop, 2)
      throw("unreachable")
   }

   ...
}

可以看到，如果通道是nil，那么往这个通道中写数据时：

非阻塞写会直接返回（在单channel发送+default分支的select操作时会调用runtime.selectnbsend函数，从而会非阻塞写）；
阻塞写（正常的ch <- v）时则会通过gopark函数让出CPU调度权，阻塞此goroutine；

3.2 直接发送

func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
   ...

   if c.closed != 0 {
      unlock(&c.lock)
      panic(plainError("send on closed channel"))
   }

   if sg := c.recvq.dequeue(); sg != nil {
      // Found a waiting receiver. We pass the value we want to send
      // directly to the receiver, bypassing the channel buffer (if any).
      send(c, sg, ep, func() { unlock(&c.lock) }, 3)
      return true
   }

   ...
}

可以发现，当channel被关闭后再发送数据，那么会导致panic。

如果目标channel没有关闭，且有已经处于读等待的goroutine，那么会直接从recvq中取出最先陷入等待的goroutine，并通过runtime.send函数向其发送数据：

func send(c *hchan, sg *sudog, ep unsafe.Pointer, unlockf func(), skip int) {
   if raceenabled {
      if c.dataqsiz == 0 {
         racesync(c, sg)
      } else {
         // Pretend we go through the buffer, even though
         // we copy directly. Note that we need to increment
         // the head/tail locations only when raceenabled.
         racenotify(c, c.recvx, nil)
         racenotify(c, c.recvx, sg)
         c.recvx++
         if c.recvx == c.dataqsiz {
            c.recvx = 0
         }
         c.sendx = c.recvx // c.sendx = (c.sendx+1) % c.dataqsiz
      }
   }
   if sg.elem != nil {
      sendDirect(c.elemtype, sg, ep)
      sg.elem = nil
   }
   gp := sg.g
   unlockf()
   gp.param = unsafe.Pointer(sg)
   sg.success = true
   if sg.releasetime != 0 {
      sg.releasetime = cputicks()
   }
   goready(gp, skip+1)
}

可以看到，以上函数做了两件事：

调用sendDirect函数将发送的数据拷贝到接收协程的变量所在的地址上；
通过goready函数唤醒协程，将其状态置为_Grunnable后放置到处理器的队列的下一个待处理goroutine；

3.3 缓存区

如果没有已经处于读等待的goroutine，且创建的channel包含缓存，并且缓存还没有满，那么会执行以下代码：

func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
   ...
   if c.qcount < c.dataqsiz {
      // Space is available in the channel buffer. Enqueue the element to send.
      qp := chanbuf(c, c.sendx)
      if raceenabled {
         racenotify(c, c.sendx, nil)
      }
      typedmemmove(c.elemtype, qp, ep)
      c.sendx++
      if c.sendx == c.dataqsiz {
         c.sendx = 0
      }
      c.qcount++
      unlock(&c.lock)
      return true
   }
   ...
}

在这里会首先通过runtime.chanbuf函数计算出下一个可以存储的位置，然后通过runtime.typedmemmove将发送的数据拷贝到缓冲区中并增加sendx索引和qcount计数器。等待有接收数据的goroutine时可以直接从缓存中读取。

3.4 阻塞发送

如果既没有等待读的goroutine，又没有缓存区或着缓存区满了，那么就会阻塞发送数据：

func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
   ...

   if !block {
      unlock(&c.lock)
      return false
   }

   // Block on the channel. Some receiver will complete our operation for us.
   gp := getg()
   mysg := acquireSudog()
   mysg.releasetime = 0
   if t0 != 0 {
      mysg.releasetime = -1
   }
   // No stack splits between assigning elem and enqueuing mysg
   // on gp.waiting where copystack can find it.
   mysg.elem = ep
   mysg.waitlink = nil
   mysg.g = gp
   mysg.isSelect = false
   mysg.c = c
   gp.waiting = mysg
   gp.param = nil
   c.sendq.enqueue(mysg)
   // Signal to anyone trying to shrink our stack that we're about
   // to park on a channel. The window between when this G's status
   // changes and when we set gp.activeStackChans is not safe for
   // stack shrinking.
   atomic.Store8(&gp.parkingOnChan, 1)
   gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanSend, traceEvGoBlockSend, 2)
   // Ensure the value being sent is kept alive until the
   // receiver copies it out. The sudog has a pointer to the
   // stack object, but sudogs aren't considered as roots of the
   // stack tracer.
   KeepAlive(ep)

   // someone woke us up.
   if mysg != gp.waiting {
      throw("G waiting list is corrupted")
   }
   gp.waiting = nil
   gp.activeStackChans = false
   closed := !mysg.success
   gp.param = nil
   if mysg.releasetime > 0 {
      blockevent(mysg.releasetime-t0, 2)
   }
   mysg.c = nil
   releaseSudog(mysg)
   if closed {
      if c.closed == 0 {
         throw("chansend: spurious wakeup")
      }
      panic(plainError("send on closed channel"))
   }
   return true
}

调用runtime.getg获取此时发送数据的goroutine；
调用runtime.acquireSudog获取sudog结构并设置相关信息；
将上一步获取的sudog放到发送等待队列，并且调用gopark挂起当前协程；
等待有接收数据的goroutine到来后，即唤醒此goroutine，然后继续往下走；或者close了此channel，导致后续的panic。

4. 接收数据

4.1 空通道的数据接收

func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
   // raceenabled: don't need to check ep, as it is always on the stack
   // or is new memory allocated by reflect.

   if debugChan {
      print("chanrecv: chan=", c, "\n")
   }

   if c == nil {
      if !block {
         return
      }
      gopark(nil, nil, waitReasonChanReceiveNilChan, traceEvGoStop, 2)
      throw("unreachable")
   }

   ...

   lock(&c.lock)

   if c.closed != 0 && c.qcount == 0 {
      if raceenabled {
         raceacquire(c.raceaddr())
      }
      unlock(&c.lock)
      if ep != nil {
         typedmemclr(c.elemtype, ep)
      }
      return true, false
   }

   ...
}

以上是通道接收时的一部分代码，可以看到：

和发送数据一样，如果通道是nil，且非阻塞读，则会返回，阻塞读后则会挂起；
和发送数据时不一样的是，如果是一个已经关闭的通道，其实是可读的，但是读回的数据都是零值+false。

4.2 直接接收

func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
   ...

   if sg := c.sendq.dequeue(); sg != nil {
      // Found a waiting sender. If buffer is size 0, receive value
      // directly from sender. Otherwise, receive from head of queue
      // and add sender's value to the tail of the queue (both map to
      // the same buffer slot because the queue is full).
      recv(c, sg, ep, func() { unlock(&c.lock) }, 3)
      return true, true
   }

   ...
}

当channel的sendq队列中包含处于等待状态的goroutine时，会取出等待的最早的写数据goroutine，然后调用runtime.recv进行发送：

func recv(c *hchan, sg *sudog, ep unsafe.Pointer, unlockf func(), skip int) {
   if c.dataqsiz == 0 {
      if raceenabled {
         racesync(c, sg)
      }
      if ep != nil {
         // copy data from sender
         recvDirect(c.elemtype, sg, ep)
      }
   } else {
      // Queue is full. Take the item at the
      // head of the queue. Make the sender enqueue
      // its item at the tail of the queue. Since the
      // queue is full, those are both the same slot.
      qp := chanbuf(c, c.recvx)
      if raceenabled {
         racenotify(c, c.recvx, nil)
         racenotify(c, c.recvx, sg)
      }
      // copy data from queue to receiver
      if ep != nil {
         typedmemmove(c.elemtype, ep, qp)
      }
      // copy data from sender to queue
      typedmemmove(c.elemtype, qp, sg.elem)
      c.recvx++
      if c.recvx == c.dataqsiz {
         c.recvx = 0
      }
      c.sendx = c.recvx // c.sendx = (c.sendx+1) % c.dataqsiz
   }
   sg.elem = nil
   gp := sg.g
   unlockf()
   gp.param = unsafe.Pointer(sg)
   sg.success = true
   if sg.releasetime != 0 {
      sg.releasetime = cputicks()
   }
   goready(gp, skip+1)
}

该函数会根据是否存在缓存区分别处理：

如果不存在缓存区，则调用runtime.recvDirect函数直接将发送goroutine存储的数据拷贝到目标内存地址中，相当于直接从这个goroutine中取数据；
如果存在缓存区，那么先将缓存区中的数据拷贝到目标内存地址中，然后将gp的数据拷贝到缓存区最后，相当于先从缓存队列头部取出数据给接收goroutine，在从等待发送goroutine中取出数据到缓存队列尾部，可以看出，此时队列一定是满的。

最后无论哪种情况，都需要调用goready唤醒gp。

4.3 从缓存区拿

其实这里的章节名描述并不准确，在4.2中也存在从缓存区拿数据的情况，差别在于：

4.2中缓存队列是满的，且还有发送阻塞等到的goroutine；
4.3中不存在发送阻塞等到的goroutine。

func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
   ...

   if c.qcount > 0 {
      // Receive directly from queue
      qp := chanbuf(c, c.recvx)
      if raceenabled {
         racenotify(c, c.recvx, nil)
      }
      if ep != nil {
         typedmemmove(c.elemtype, ep, qp)
      }
      typedmemclr(c.elemtype, qp)
      c.recvx++
      if c.recvx == c.dataqsiz {
         c.recvx = 0
      }
      c.qcount--
      unlock(&c.lock)
      return true, true
   }

   ...
}

和发送时一样，如果缓存区有数据，那么从缓存区拷贝数据。

4.4 阻塞接收

func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
   ...

   if !block {
      unlock(&c.lock)
      return false, false
   }

   // no sender available: block on this channel.
   gp := getg()
   mysg := acquireSudog()
   mysg.releasetime = 0
   if t0 != 0 {
      mysg.releasetime = -1
   }
   // No stack splits between assigning elem and enqueuing mysg
   // on gp.waiting where copystack can find it.
   mysg.elem = ep
   mysg.waitlink = nil
   gp.waiting = mysg
   mysg.g = gp
   mysg.isSelect = false
   mysg.c = c
   gp.param = nil
   c.recvq.enqueue(mysg)
   // Signal to anyone trying to shrink our stack that we're about
   // to park on a channel. The window between when this G's status
   // changes and when we set gp.activeStackChans is not safe for
   // stack shrinking.
   atomic.Store8(&gp.parkingOnChan, 1)
   gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanReceive, traceEvGoBlockRecv, 2)

   // someone woke us up
   if mysg != gp.waiting {
      throw("G waiting list is corrupted")
   }
   gp.waiting = nil
   gp.activeStackChans = false
   if mysg.releasetime > 0 {
      blockevent(mysg.releasetime-t0, 2)
   }
   success := mysg.success
   gp.param = nil
   mysg.c = nil
   releaseSudog(mysg)
   return true, success
}

和阻塞发送类似，如果没有等待发送的goroutine，且没有缓存区或者缓存区没有数据，那这个时候就需要将此接收goroutine压到recvq中，并且gopark挂起，等待唤醒。

5. 关闭

func closechan(c *hchan) {
   if c == nil {
      panic(plainError("close of nil channel"))
   }

   lock(&c.lock)
   if c.closed != 0 {
      unlock(&c.lock)
      panic(plainError("close of closed channel"))
   }

   if raceenabled {
      callerpc := getcallerpc()
      racewritepc(c.raceaddr(), callerpc, abi.FuncPCABIInternal(closechan))
      racerelease(c.raceaddr())
   }

   c.closed = 1

   var glist gList

   // release all readers
   for {
      sg := c.recvq.dequeue()
      if sg == nil {
         break
      }
      if sg.elem != nil {
         typedmemclr(c.elemtype, sg.elem)
         sg.elem = nil
      }
      if sg.releasetime != 0 {
         sg.releasetime = cputicks()
      }
      gp := sg.g
      gp.param = unsafe.Pointer(sg)
      sg.success = false
      if raceenabled {
         raceacquireg(gp, c.raceaddr())
      }
      glist.push(gp)
   }

   // release all writers (they will panic)
   for {
      sg := c.sendq.dequeue()
      if sg == nil {
         break
      }
      sg.elem = nil
      if sg.releasetime != 0 {
         sg.releasetime = cputicks()
      }
      gp := sg.g
      gp.param = unsafe.Pointer(sg)
      sg.success = false
      if raceenabled {
         raceacquireg(gp, c.raceaddr())
      }
      glist.push(gp)
   }
   unlock(&c.lock)

   // Ready all Gs now that we've dropped the channel lock.
   for !glist.empty() {
      gp := glist.pop()
      gp.schedlink = 0
      goready(gp, 3)
   }
}

关闭通道的代码看上去很长，实际上在处理完一些特殊情况后，就是对发送和接收队列的数据通通使用goready唤醒。

6. 总结

在Go中，虽然极力推崇CSP哲学，推荐大家使用channel实现共享内存的保护，但是：

在幕后，通道使用锁来序列化访问并提供线程安全性。因此，通过使用通道同步对内存的访问，你实际上就是在使用锁。被包装在线程安全队列中的锁。那么，与仅仅使用标准库 sync 包中的互斥量相比，Go 的花式锁又如何呢？以下数字是通过使用 Go 的内置基准测试功能，对它们的单个集合连续调用 Put 得出的。

`> BenchmarkSimpleSet-8 3000000 391 ns/op`
`> BenchmarkSimpleChannelSet-8 1000000 1699 ns/o`

就我个人的理解而言：

在进行数据的传输时使用channel；
在进行内存数据的保护时使用sync.Mutex；
利用channel和select的特性，实现类似于Linux epoll的功能。

Golang并发编程之Channel详解_Golang

目录