应用部署引起上游服务抖动问题分析及优化实践方案

科技资讯 投稿 7800 0 评论

应用部署引起上游服务抖动问题分析及优化实践方案

作者:京东物流 朱永昌

背景介绍

百川分流系统作为交易订单中心的专用网关,为交易订单中心提供统一的对外标准服务(包括接单、修改、取消、回传等),对内则基于配置规则将流量分发到不同业务线的应用上。随着越来越多的流量切入百川系统,因系统部署引起服务抖动导致上游系统调用超时的问题也逐渐凸显出来。为提供稳定的交易服务系统,提升系统可用率,需要对该问题进行优化。

(1)JSF官方提供的预热方案;

关于方案

关于方案

基于以上情况,我们通过百川分流系统部署引起上游服务抖动这个实例,追踪其表象线索,深入研读JSF源码,最终找到导致服务抖动的关键因素,开发了一套更加有效的预热方案,验证结果表明该方案预热效果明显,服务调用方方法性能MAX值降低90%,降到了超时时间范围内,消除了因机器部署引起上游调用超时的问题。

问题现象

查看此服务UMP打点,发现此服务的方法性能监控MAX值最大3073ms,未超过调用方设置的超时时间10000ms(如图1所示)

查看此服务PFinder性能监控,发现上游调用方应用调用此服务的方法性能监控MAX值多次超过10000ms(可以直接查看调用方的UMP打点,若调用方无法提供UMP打点时,也可借助PFinder的应用拓扑功能进行查看,如图2所示)

分析思路

从上述问题现象可以看出,在系统上线部署期间服务提供方接口性能MAX值并无明显抖动,但服务调用方接口性能MAX值抖动明显。由此,可以确定耗时不在服务提供方内部处理逻辑上,而是在进入服务提供方内部处理逻辑之前(或者之后),那么在之前或者之后具体都经历了什么呢?我们不着急回答这个问题,先基于现有的一些线索逐步进行追踪探索。

线索一:部署过程中机器CPU会有短暂飙升(如图3所示)

JSF延迟发布参数来实现。具体配置如下:

 <jsf:provider id="createExpressOrderService" 
               interface="cn.jdl.oms.api.CreateExpressOrderService"
               ref="createExpressOrderServiceImpl"
               register="true"
               concurrents="400"
               alias="${provider.express.oms}"
               // 延迟发布2分钟
               delay="120000">
</jsf:provider>

然而,实践证明JSF服务确实延迟了2分钟才上线(如图4所示),且此时CPU已经处于平稳状态,但是JSF上线瞬间又引起了CPU的二次飙升,同时调用方仍然会出现服务调用超时的现象。

线索二:JSF上线瞬间JVM线程数飙升(如图5所示)

使用jstack命令工具查看线程堆栈,可以发现数量增长最多的线程是JSF-BZ线程,且都处于阻塞等待状态:

"JSF-BZ-22000-137-T-350" #1038 daemon prio=5 os_prio=0 tid=0x00007f02bcde9000 nid=0x6fff waiting on condition [0x00007efa10284000]
   java.lang.Thread.State: WAITING (parking
	at sun.misc.Unsafe.park(Native Method
	- parking to wait for  <0x0000000640b359e8> (a java.util.concurrent.SynchronousQueue$TransferStack
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175
	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458
	at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362
	at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:924
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617
	at java.lang.Thread.run(Thread.java:745

   Locked ownable synchronizers:
	- None

"JSF-BZ-22000-137-T-349" #1037 daemon prio=5 os_prio=0 tid=0x00007f02bcde7000 nid=0x6ffe waiting on condition [0x00007efa10305000]
   java.lang.Thread.State: WAITING (parking
	at sun.misc.Unsafe.park(Native Method
	- parking to wait for  <0x0000000640b359e8> (a java.util.concurrent.SynchronousQueue$TransferStack
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175
	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458
	at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362
	at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:924
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617
	at java.lang.Thread.run(Thread.java:745

   Locked ownable synchronizers:
	- None

"JSF-BZ-22000-137-T-348" #1036 daemon prio=5 os_prio=0 tid=0x00007f02bcdd8000 nid=0x6ffd waiting on condition [0x00007efa10386000]
   java.lang.Thread.State: WAITING (parking
	at sun.misc.Unsafe.park(Native Method
	- parking to wait for  <0x0000000640b359e8> (a java.util.concurrent.SynchronousQueue$TransferStack
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175
	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458
	at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362
	at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:924
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617
	at java.lang.Thread.run(Thread.java:745

   Locked ownable synchronizers:
	- None

...


通过关键字“JSF-BZ”可以在JSF源码中检索,可以找到关于“JSF-BZ”线程池初始化源码如下:

private static synchronized ThreadPoolExecutor initPool(ServerTransportConfig transportConfig {
    final int minPoolSize, aliveTime, port = transportConfig.getPort(;

    int maxPoolSize = transportConfig.getServerBusinessPoolSize(;
    String poolType = transportConfig.getServerBusinessPoolType(;
    if ("fixed".equals(poolType { minPoolSize = maxPoolSize;
    aliveTime = 0;
    } else if ("cached".equals(poolType { minPoolSize = 20;
    maxPoolSize = Math.max(minPoolSize, maxPoolSize;
    aliveTime = 60000;
    } else { throw new IllegalConfigureException(21401, "server.threadpool", poolType;
    }

    String queueType = transportConfig.getPoolQueueType(;
    int queueSize = transportConfig.getPoolQueueSize(;
    boolean isPriority = "priority".equals(queueType;
    BlockingQueue<Runnable> configQueue = ThreadPoolUtils.buildQueue(queueSize, isPriority;

    NamedThreadFactory threadFactory = new NamedThreadFactory("JSF-BZ-" + port, true;
    RejectedExecutionHandler handler = new RejectedExecutionHandler( {
        private int i = 1;

        public void rejectedExecution(Runnable r, ThreadPoolExecutor executor { if (this.i++ % 7 == 0 {
            this.i = 1;
            BusinessPool.LOGGER.warn("[JSF-23002]Task:{} has been reject for ThreadPool exhausted! pool:{}, active:{}, queue:{}, taskcnt: {}", new Object[] { r, Integer.valueOf(executor.getPoolSize(, Integer.valueOf(executor.getActiveCount(, Integer.valueOf(executor.getQueue(.size(, Long.valueOf(executor.getTaskCount( };
        }

        RejectedExecutionException err = new RejectedExecutionException("[JSF-23003]Biz thread pool of provider has bean exhausted, the server port is " + port;

        ProviderErrorHook.getErrorHookInstance(.onProcess(new ProviderErrorEvent(err;
        throw err;
        }
    };
    LOGGER.debug("Build " + poolType + " business pool for port " + port + " [min: " + minPoolSize + " max:" + maxPoolSize + " queueType:" + queueType + " queueSize:" + queueSize + " aliveTime:" + aliveTime + "]";

    return new ThreadPoolExecutor(minPoolSize, maxPoolSize, aliveTime, TimeUnit.MILLISECONDS, configQueue, (ThreadFactorythreadFactory, handler;
}

public static BlockingQueue<Runnable> buildQueue(int size, boolean isPriority {
    BlockingQueue<Runnable> queue;
    if (size == 0 {
      queue = new SynchronousQueue<Runnable>(;
    }
    else if (isPriority {
      queue = (size < 0 ? new PriorityBlockingQueue<Runnable>( : new PriorityBlockingQueue<Runnable>(size;
    } else {
      queue = (size < 0 ? new LinkedBlockingQueue<Runnable>( : new LinkedBlockingQueue<Runnable>(size;
    } 
    
    return queue;
  }


另外,JSF官方文档关于线程池的说明如下:

SynchronousQueue,这是一个同步阻塞队列,其中每个put必须等待一个take,反之亦然。JSF-BZ线程池默认使用的是伸缩无队列线程池,初始线程数为20个,那么在JSF上线的瞬间,大批量并发请求进入,初始化线程远不够用,因此新建了大量线程。

// 从Spring上下文获取JSF ServerBean,可能有多个
Map<String, ServerBean> serverBeanMap = applicationContext.getBeansOfType(ServerBean.class;
if (CollectionUtils.isEmpty(serverBeanMap {
    log.error("application preheat, jsf thread pool preheat failed, serverBeanMap is empty.";
    return;
}

// 遍历所有serverBean,分别做预热处理
serverBeanMap.forEach((serverBeanName, serverBean -> {
    if (Objects.isNull(serverBean {
        log.error("application preheat, jsf thread pool preheat failed, serverBean is null, serverBeanName:{}", serverBeanName;
        return;
    }
    // 启动ServerBean,启动后才可以获取到Server
    serverBean.start(;
    Server server = serverBean.getServer(;
    if (Objects.isNull(server {
        log.error("application preheat, jsf thread pool preheat failed, JSF Server is null, serverBeanName:{}", serverBeanName;
        return;
    }

    ServerTransportConfig serverTransportConfig = server.getTransportConfig(;
    if (Objects.isNull(serverTransportConfig {
        log.error("application preheat, jsf thread pool preheat failed, serverTransportConfig is null, serverBeanName:{}", serverBeanName;
        return;
    }
    // 获取JSF业务线程池
    ThreadPoolExecutor businessPool = BusinessPool.getBusinessPool(serverTransportConfig;
    if (Objects.isNull(businessPool {
        log.error("application preheat, jsf biz pool preheat failed, businessPool is null, serverBeanName:{}", serverBeanName;
        return;
    }

    int corePoolSize = businessPool.getCorePoolSize(;
    int maxCorePoolSize = Math.max(corePoolSize, 500;

    if (maxCorePoolSize > corePoolSize {
        // 设置JSF server核心线程数
        businessPool.setCorePoolSize(maxCorePoolSize;
    }
    // 初始化JSF业务线程池所有核心线程
    if (businessPool.getPoolSize( < maxCorePoolSize {
        businessPool.prestartAllCoreThreads(;
    }
}


线索三:JSF-BZ线程池预热完成后,JSF上线瞬间JVM线程数仍有升高

JSF-SEV-WORKER线程:

"JSF-SEV-WORKER-139-T-129" #1295 daemon prio=5 os_prio=0 tid=0x00007ef66000b800 nid=0x7289 runnable [0x00007ef627cf8000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86
	- locked <0x0000000644f558b8> (a io.netty.channel.nio.SelectedSelectionKeySet
	- locked <0x0000000641eaaca0> (a java.util.Collections$UnmodifiableSet
	- locked <0x0000000641eaab88> (a sun.nio.ch.EPollSelectorImpl
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101
	at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68
	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:805
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74
	at java.lang.Thread.run(Thread.java:745

   Locked ownable synchronizers:
	- None

"JSF-SEV-WORKER-139-T-128" #1293 daemon prio=5 os_prio=0 tid=0x00007ef60c002800 nid=0x7288 runnable [0x00007ef627b74000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86
	- locked <0x0000000641ea7450> (a io.netty.channel.nio.SelectedSelectionKeySet
	- locked <0x0000000641e971e8> (a java.util.Collections$UnmodifiableSet
	- locked <0x0000000641e970d0> (a sun.nio.ch.EPollSelectorImpl
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101
	at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68
	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:805
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74
	at java.lang.Thread.run(Thread.java:745

   Locked ownable synchronizers:
	- None

"JSF-SEV-WORKER-139-T-127" #1291 daemon prio=5 os_prio=0 tid=0x00007ef608001000 nid=0x7286 runnable [0x00007ef627df9000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86
	- locked <0x0000000641e93998> (a io.netty.channel.nio.SelectedSelectionKeySet
	- locked <0x0000000641e83730> (a java.util.Collections$UnmodifiableSet
	- locked <0x0000000641e83618> (a sun.nio.ch.EPollSelectorImpl
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101
	at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68
	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:805
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74
	at java.lang.Thread.run(Thread.java:745

   Locked ownable synchronizers:
	- None


那么JSF-SEV-WORKER线程是做什么的?我们是不是也可以对它做预热操作?带着这些疑问,再次查阅JSF源码:

private synchronized EventLoopGroup initChildEventLoopGroup( {
     NioEventLoopGroup nioEventLoopGroup = null;
     int threads = (this.childNioEventThreads > 0 ? this.childNioEventThreads : Math.max(8, Constants.DEFAULT_IO_THREADS;
 
     NamedThreadFactory threadName = new NamedThreadFactory("JSF-SEV-WORKER", isDaemon(;
     EventLoopGroup eventLoopGroup = null;
     if (isUseEpoll( {
       EpollEventLoopGroup epollEventLoopGroup = new EpollEventLoopGroup(threads, (ThreadFactorythreadName;
     } else {
       nioEventLoopGroup = new NioEventLoopGroup(threads, (ThreadFactorythreadName;
     } 
     return (EventLoopGroupnioEventLoopGroup;
}


从JSF源码中可以看出JSF-SEV-WORKER线程是JSF内部使用Netty处理网络通信创建的线程,仔细研读JSF源码同样可以找到预热JSF-SEV-WORKER线程的方法,代码如下:

// 通过serverTransportConfig获取NioEventLoopGroup
// 其中,serverTransportConfig的获取方式可参考JSF-BZ线程预热代码
NioEventLoopGroup eventLoopGroup = (NioEventLoopGroup serverTransportConfig.getChildEventLoopGroup(;

int threadSize = this.jsfSevWorkerThreads;
while (threadSize-- > 0 {
    new Thread(( -> {
        // 通过手工提交任务的方式创建JSF-SEV-WORKER线程达到预热效果
        eventLoopGroup.submit(( -> log.info("submit thread to netty by hand, threadName:{}", Thread.currentThread(.getName(;
    }.start(;
}


JSF-BZ线程、JSF-SEV-WORKER线程预热效果如下图所示:

挖掘源码线索

经过仔细研读JSF源码,我们可以发现JSF内部对于接口出入参有一系列编码、解码、序列化、反序列化的操作,而且在这些操作中我们有了惊喜的发现:本地缓存,部分源码如下:

DESC_CLASS_CACHE

private static final ConcurrentMap<String, Class<?>> DESC_CLASS_CACHE = new ConcurrentHashMap<String, Class<?>>(;

private static Class<?> desc2class(ClassLoader cl, String desc throws ClassNotFoundException {
  switch (desc.charAt(0 {
    case 'V':
      return void.class;
    case 'Z': return boolean.class;
    case 'B': return byte.class;
    case 'C': return char.class;
    case 'D': return double.class;
    case 'F': return float.class;
    case 'I': return int.class;
    case 'J': return long.class;
    case 'S': return short.class;
    case 'L':
      desc = desc.substring(1, desc.length( - 1.replace('/', '.';
      break;
    case '[':
      desc = desc.replace('/', '.';
      break;
    default:
      throw new ClassNotFoundException("Class not found: " + desc;
  } 
  
  if (cl == null
    cl = ClassLoaderUtils.getCurrentClassLoader(; 
  Class<?> clazz = DESC_CLASS_CACHE.get(desc;
  if (clazz == null {
    clazz = Class.forName(desc, true, cl;
    DESC_CLASS_CACHE.put(desc, clazz;
  } 
  return clazz;
}


NAME_CLASS_CACHE

private static final ConcurrentMap<String, Class<?>> NAME_CLASS_CACHE = new ConcurrentHashMap<String, Class<?>>(;

private static Class<?> name2class(ClassLoader cl, String name throws ClassNotFoundException {
  int c = 0, index = name.indexOf('[';
  if (index > 0 {
    
    c = (name.length( - index / 2;
    name = name.substring(0, index;
  } 
  if (c > 0 {
    
    StringBuilder sb = new StringBuilder(;
    while (c-- > 0 {
      sb.append("[";
    }
    if ("void".equals(name { sb.append('V'; }
    else if ("boolean".equals(name { sb.append('Z'; }
    else if ("byte".equals(name { sb.append('B'; }
    else if ("char".equals(name { sb.append('C'; }
    else if ("double".equals(name { sb.append('D'; }
    else if ("float".equals(name { sb.append('F'; }
    else if ("int".equals(name { sb.append('I'; }
    else if ("long".equals(name { sb.append('J'; }
    else if ("short".equals(name { sb.append('S'; }
    else { sb.append('L'.append(name.append(';'; }
     name = sb.toString(;
  }
  else {
    
    if ("void".equals(name return void.class; 
    if ("boolean".equals(name return boolean.class; 
    if ("byte".equals(name return byte.class; 
    if ("char".equals(name return char.class; 
    if ("double".equals(name return double.class; 
    if ("float".equals(name return float.class; 
    if ("int".equals(name return int.class; 
    if ("long".equals(name return long.class; 
    if ("short".equals(name return short.class;
  
  } 
  if (cl == null
    cl = ClassLoaderUtils.getCurrentClassLoader(; 
  Class<?> clazz = NAME_CLASS_CACHE.get(name;
  if (clazz == null {
    clazz = Class.forName(name, true, cl;
    NAME_CLASS_CACHE.put(name, clazz;
  } 
  return clazz;
}


SerializerCache

private ConcurrentHashMap _cachedSerializerMap;

public Serializer getSerializer(Class<?> cl throws HessianProtocolException {
  Serializer serializer = (Serializer_staticSerializerMap.get(cl;
  if (serializer != null {
    return serializer;
  }
  
  if (this._cachedSerializerMap != null {
    serializer = (Serializerthis._cachedSerializerMap.get(cl;
    if (serializer != null {
      return serializer;
    }
  } 
  
  int i = 0;
  for (; serializer == null && this._factories != null && i < this._factories.size(; 
    i++ {

    
    AbstractSerializerFactory factory = this._factories.get(i;
    
    serializer = factory.getSerializer(cl;
  } 
  
  if (serializer == null
  {
    if (isZoneId(cl {
      ZoneIdSerializer zoneIdSerializer = ZoneIdSerializer.getInstance(;
    } else if (isEnumSet(cl {
      serializer = EnumSetSerializer.getInstance(;
    } else if (JavaSerializer.getWriteReplace(cl != null {
      serializer = new JavaSerializer(cl, this._loader;
    }
    else if (HessianRemoteObject.class.isAssignableFrom(cl {
      serializer = new RemoteSerializer(;


    
    }
    else if (Map.class.isAssignableFrom(cl {
      if (this._mapSerializer == null {
        this._mapSerializer = new MapSerializer(;
      }
      serializer = this._mapSerializer;
    } else if (Collection.class.isAssignableFrom(cl {
      if (this._collectionSerializer == null {
        this._collectionSerializer = new CollectionSerializer(;
      }
      
      serializer = this._collectionSerializer;
    } else if (cl.isArray( {
      serializer = new ArraySerializer(;
    } else if (Throwable.class.isAssignableFrom(cl {
      serializer = new ThrowableSerializer(cl, getClassLoader(;
    } else if (InputStream.class.isAssignableFrom(cl {
      serializer = new InputStreamSerializer(;
    } else if (Iterator.class.isAssignableFrom(cl {
      serializer = IteratorSerializer.create(;
    } else if (Enumeration.class.isAssignableFrom(cl {
      serializer = EnumerationSerializer.create(;
    } else if (Calendar.class.isAssignableFrom(cl {
      serializer = CalendarSerializer.create(;
    } else if (Locale.class.isAssignableFrom(cl {
      serializer = LocaleSerializer.create(;
    } else if (Enum.class.isAssignableFrom(cl {
      serializer = new EnumSerializer(cl;
    } 
  }
  if (serializer == null {
    serializer = getDefaultSerializer(cl;
  }
  
  if (this._cachedSerializerMap == null {
    this._cachedSerializerMap = new ConcurrentHashMap<Object, Object>(8;
  }
  
  this._cachedSerializerMap.put(cl, serializer;
  
  return serializer;
}


DeserializerCache

private ConcurrentHashMap _cachedDeserializerMap;

public Deserializer getDeserializer(Class<?> cl throws HessianProtocolException {
  Deserializer deserializer = (Deserializer_staticDeserializerMap.get(cl;
  if (deserializer != null {
    return deserializer;
  }
  if (this._cachedDeserializerMap != null {
    deserializer = (Deserializerthis._cachedDeserializerMap.get(cl;
    if (deserializer != null {
      return deserializer;
    }
  } 
  
  int i = 0;
  for (; deserializer == null && this._factories != null && i < this._factories.size(; 
    i++ {
    
    AbstractSerializerFactory factory = this._factories.get(i;
    
    deserializer = factory.getDeserializer(cl;
  } 
  
  if (deserializer == null
    if (Collection.class.isAssignableFrom(cl {
      deserializer = new CollectionDeserializer(cl;
    }
    else if (Map.class.isAssignableFrom(cl {
      deserializer = new MapDeserializer(cl;
    }
    else if (cl.isInterface( {
      deserializer = new ObjectDeserializer(cl;
    }
    else if (cl.isArray( {
      deserializer = new ArrayDeserializer(cl.getComponentType(;
    }
    else if (Enumeration.class.isAssignableFrom(cl {
      deserializer = EnumerationDeserializer.create(;
    }
    else if (Enum.class.isAssignableFrom(cl {
      deserializer = new EnumDeserializer(cl;
    }
    else if (Class.class.equals(cl {
      deserializer = new ClassDeserializer(this._loader;
    } else {
      
      deserializer = getDefaultDeserializer(cl;
    }  
  if (this._cachedDeserializerMap == null {
    this._cachedDeserializerMap = new ConcurrentHashMap<Object, Object>(8;
  }
  this._cachedDeserializerMap.put(cl, deserializer;
  
  return deserializer;
}


如上述源码所示,我们找到了四个本地缓存,遗憾的是,这四个本地缓存都是私有的,我们并不能直接对其进行初始化。但是我们还是从源码中找到了可以间接对这四个本地缓存进行初始化预热的方法,代码如下:

DESC_CLASS_CACHE、NAME_CLASS_CACHE预热代码

// DESC_CLASS_CACHE预热
ReflectUtils.desc2classArray(ReflectUtils.getDesc(Class.forName("cn.jdl.oms.express.model.CreateExpressOrderRequest";
// NAME_CLASS_CACHE预热
ReflectUtils.name2class("cn.jdl.oms.express.model.CreateExpressOrderRequest";



SerializerCache、DeserializerCache预热代码

public class JsfSerializerFactoryPreheat extends HessianSerializerFactory {

    public static void doPreheat(String className {
        try {
            // 序列化
            JsfSerializerFactoryPreheat.SERIALIZER_FACTORY.getSerializer(Class.forName("cn.jdl.oms.express.model.CreateExpressOrderRequest";
            // 反序列化
            JsfSerializerFactoryPreheat.SERIALIZER_FACTORY.getDeserializer(Class.forName(className;
        } catch (Exception e {
            // do nothing
            log.error("JsfSerializerFactoryPreheat failed:", e;
        }
    }
}


由JSF源码对于接口出入参编码、解码、序列化、反序列化操作,我们又想到应用接口内部有对出入参进行Fastjson序列化的操作,而且Fastjson序列化时需要初始化SerializeConfig,对性能会有一定影响(可参考
https://www.ktanx.com/blog/p/3181)。我们可以通过以下代码对Fastjson进行初始化预热:

JSON.parseObject(JSON.toJSONString(Class.forName("cn.jdl.oms.express.model.CreateExpressOrderRequest".newInstance(, Class.forName("cn.jdl.oms.express.model.CreateExpressOrderRequest";


到目前为止,我们针对应用启动预热做了以下工作:

•JSF-BZ线程池预热

•JSF编码、解码、序列化、反序列化缓存预热

经过以上预热操作,应用部署引起服务抖动的现象得到了明显改善,由治理前的10000ms-20000ms降低到了 2000ms-3000ms(略高于日常流量抖动幅度。

解决方案

应用部署导致服务抖动属于一个共性问题,针对此问题目前有如下可选方案:

JSF官方提供的预热方案(
https://cf.jd.com/pages/viewpage.action?pageId=1132755015)

优点:平台配置即可,接入成本低。

2、流量录制回放预热方案

优点:结合了行云部署编排,下线、部署、预热、上线,以压测的方式可以使得预热更加充分。

3、本文方案

优点:资源预热充分;使用简单,支持自定义扩展。

预热效果

预热前:

总结

应用部署引起上游服务抖动是一个常见问题,如果上游系统对服务抖动比较敏感,或会因此造成业务影响的话,这个问题还是需要引起我们足够的重视与关注。本文涉及的百川分流系统,单纯对外提供JSF服务,且无其他中间件的引入,特点是接口多,调用量大。

此问题在系统运行前期并不明显,上线部署上游基本无感,但随着调用量的增长,问题才逐渐凸显出来,如果单纯通过扩容也是可以缓解这个问题,但是这样会带来很大的资源浪费,违背“降本”的原则。为此,从已有线索出发,逐步深挖JSF源码,对线程池、本地缓存等在系统启动时进行充分初始化预热操作,从而有效降低JSF上线瞬间的服务抖动。

编程笔记 » 应用部署引起上游服务抖动问题分析及优化实践方案

赞同 (40) or 分享 (0)
游客 发表我的评论   换个身份
取消评论

表情
(0)个小伙伴在吐槽