点滴诗词

经典01背包问题

2024-05-24

背景

01背包是背包问题的基础数学模型，属于典型的动态规划问题。该问题描述为在容量为V的背包中装入n件物品（每件仅有体积和价值属性），
每个物品只能选择放入或不放入，目标为获得最大总价值且总体积不超过背包容量。
学习基本01背包问题对我们了解动态规划有很大的帮助，话不多说，直接上代码.

01背包代码

// Bag
// 动态规划： 主要推出状态转移方程
// 01背包容量固定， 物品有容量+价值
// 状态转移数组dp[i][j] 表示前i个物品放入重量为j的背包能装的最大价值
// 物品重量： 2 3 4 5； 价值：3 4 5 8
// dp[i][j] 1. 不放入i物品(放不下)， dp[i-1][j]; 2. 放入i物品， dp[i-1][j-w[i]]
func Bag(weight []int, values []int, cap int) {
	// 初始化数组
	dp := make([][]int, len(values)+1)
	for index := range dp {
		dp[index] = make([]int, cap+1)
	}
	for index := 1; index <= len(values); index += 1 {
		for j := 1; j <= cap; j += 1 {
			if weight[index-1] > j { // 放不下
				dp[index][j] = dp[index-1][j] // 不放入index个物品
			} else {
				noPut := dp[index-1][j] // 不放入的价值
				// dp[index-1][j-weight[index-1]] 要找到 没放当前商品的时候的最大值
				// index-1表示上一个商品; j-weight[index-1] 表示没有放入当前商品时候的最大价值
				put := dp[index-1][j-weight[index-1]] + values[index-1] // 放入的价值
				dp[index][j] = max(noPut, put)
			}
		}
	}
	for i := 0; i <= len(values); i += 1 {
		for j := 0; j <= cap; j += 1 {
			fmt.Printf("%d ", dp[i][j])
		}
		fmt.Println()
	}
}

转移方程

01背包的状态转换方程

f[i,j] = Max{ f[i-1,j-Wi]+Pi( j >= Wi ), f[i-1,j] }

大家都知道01背包的转移方程其实不难，但怎么理解呢？

动态规划算法解决此问题的核心思想是：背包容量为1时所能获得的最大收益是很容易计算的，在此基础上，可以推算出背包容量为2、3、4…所能获得的
最大收益。建立如下这张表格，依次将各个商品装入不同承重的背包中，计算出它们所能获得的最大收益。

商品种类	背包容量	–	–	–	–	–	–	–	–	–
商品种类	0	1	2	3	4	5	6	7	8	9
不装任何商品	0	0	0	0	0	0	0	0	0	0
商品一（容量2，权重/价值3）	0(装不下)	0(装不下)	3	3	3	3	3	3	3	3
商品二（容量3，权重/价值4）	0(装不下)	0(装不下)	3(只能装第一个)	4 + 0(放入该商品) 或者 3(不放入)	—-	—-	—-	—-	—-	—-
商品三（容量4，权重/价值5）	0(装不下)	0(装不下)	3	—-	—-	—-	—-	—-	—-	—-
商品四（容量5，权重/价值8）	0(装不下)	0(装不下)	3	—-	—-	—-	—-	—-	—-	—-

状态转移是按照背包容量/商品种类一点一点增大来比较的;

首先要清楚f[i, j]代表什么？

f[i, j] 代表容量为j的背包，有i个商品的情况下，所能获取的最大价值，如上表所示：

f[0, j] 没有任何商品就都是0

f[1, j] 只有一件商品，只要装的下最大价值就是商品一的价值

f[2, j] 两件商品,依次判断容量

随着商品的增多总结出规律即状态转移方程;

第一行只有一个商品的时候，装不下就是0，装的下就是第一个商品的价值，如上表是3；

第二行有两个商品，容量一点一点增大，装不下商品二的时候，最大价值就是只有商品一的最大价值；注意临界点，当可以放入第二个商品的时候，
需要计算两种：

一种是不放入第二个商品，因为第一个商品可能价值比第二种高；

另一种是放入该商品，需要计算放入该商品后剩余容量能够存放的最大价值;剩余容量的最大价值就是只有商品一
且容量为当前容量(商品二背包容量为3)减去商品二的容量(3)位置即：商品一容量0的位置；
计算以上这两种方式最大的价值放入并计算总的最大价值

结语

01背包问题是动态规划经典的算法，理解01背包问题，能够让我们对动态规划有更进一步的理解。
当然还有很多的变形，后续逐步学习。

Leetcode之无重复字符串的最长子串

2024-05-22

题目

1	给定一个字符串 s ，请你找出其中不含有重复字符的最长子串的长度。

示例

示例 1:

输入: s = "abcabcbb"
输出: 3 
解释: 因为无重复字符的最长子串是 "abc"，所以其长度为 3。
示例 2:

输入: s = "bbbbb"
输出: 1
解释: 因为无重复字符的最长子串是 "b"，所以其长度为 1。
示例 3:

输入: s = "pwwkew"
输出: 3
解释: 因为无重复字符的最长子串是 "wke"，所以其长度为 3。
     请注意，你的答案必须是 子串 的长度，"pwke" 是一个子序列，不是子串。

暴力解法

这个题最容易想到的还是暴力解法，既然要找最大的子串，我就从头开始一个一个找。

固定子串的左边界，右边界往右移动，逐个往子串中添加字符，直到子串中已经存在字符，这时找到一个最大子串不重复的子串记录长度；左边界右移一个字符，继续操作，不断替换最大值。

这里的时间复杂度是 O(n2)

// LenOfNonRepeatedSubstr 无重复子串的最大长度 1. 暴力之美
func LenOfNonRepeatedSubstr(str string) int {
	maxStr := ""
	cur := 0
	maxLen := 0
	for index := 0; index < len(str); index += 1 {
		if strings.Index(maxStr, string(str[index])) != -1 { // 找到了， 回退并且置空
			maxStr = ""
			index = cur     // 继续查找下一个位置开始的子串
			cur = index + 1 // 记录当前子串的起始位置
		} else { // 无重复 直接增加
			maxStr += string(str[index])
			if maxLen < len(maxStr) {
				maxLen = len(maxStr)
			}
		}
	}
	return maxLen
}

####

回溯的位置待查找

暴力解法在于，每次都重新开始匹配下一个位置，没有利用上次匹配到的位置，假如abcac 匹配到第二个a的时候，要重新从b开始匹配。

实际并不需要，abc是无重复的，后面的a已经重复了，我们只需要在abc中找到 a 重复的位置，把他之前的子串去掉，继续往后匹配.

比如abc 匹配到 a的时候重复了，bca不重复继续往后匹配。时间复杂度 O(N)

// LenOfNonRepeatedSubstrV1 无重复子串的最大长度 利用已经比较的最大子串
func LenOfNonRepeatedSubstrV1(str string) int {
	maxStr := ""
	maxLen := 0
	for index := 0; index < len(str); index += 1 {
		for strings.Index(maxStr, string(str[index])) != -1 { // 找到了，去掉最左边的字符，继续查找
			// i..j 如果是最大不重复的，那么i+1...j 不可能是最大的
			// 直接将左边的去掉 看j+1是否在子串 逐个去掉左边的字符
			maxStr = maxStr[1:] // 去掉最左边的字符
			continue
		}
    // 以上for循环可等价于
    // pos := strings.Index(maxStr, string(str[index]))
		// if pos != -1 {
		// 	maxStr = maxStr[pos+1:]
		// }
    
		// j+1不在子串了
		maxStr += string(str[index])
		if maxLen < len(maxStr) {
			maxLen = len(maxStr)
		}
	}
	return maxLen
}

这个在匹配的时候还是会遍历查找maxStr 中是否包含字符，这里做个改进

// LenOfNonRepeatedSubstrV2 无重复子串的最大长度 利用已经比较的最大子串
func LenOfNonRepeatedSubstrV2(str string) int {
	maxLen := 0
	strMap := make(map[byte]struct{})
	rk := -1 // 初始化 rk 为左边的
	for index := 0; index < len(str); index += 1 {
		if index != 0 {
			delete(strMap, str[index-1]) // 存在重复的 就去掉左边
		}

		for rk+1 < len(str) {
			if _, ok := strMap[str[rk+1]]; !ok { // 不重复就放入 并且指针后移
				strMap[str[rk+1]] = struct{}{}
				rk += 1
			} else {
				break
			}
		}
		maxLen = max(maxLen, rk-index+1)
	}
	return maxLen
}

这里只是把字符串换成了 map，查找的时候不用遍历.

后来看别人总结，才发现这个方法叫做滑动窗口，维护一个窗口，上面的代码实现就是 maxStr 或者strMap 窗口中出现过的字符，就逐步缩小左边窗口，之后继续扩大右边窗口。

不管叫什么字符串匹配优化方法都是从已经匹配的子串中，继续往后操作，就是利用前面的匹配，不能每次都重新开始。

一晚上一个算法，真是衰🐶。

MacOS永久免费使用Typora

2024-05-17

Typora

Typora是一款 Markdown 编辑器和阅读器。

个人觉得挺好用的，可是已经开启收割模式。

这里附上激活的软件下载地址，下载的可能比较慢，大家耐心等待.

mac版本Typora

win系统1.8.10(附带激活)

安装使用

下载安装完成后，在mac下会提示如下错误

按照如下配置 “仍要打开” 就可以了

打开后就可以开心的使用了。

数据库与缓存如何保持一致性

2023-05-22

数据库与缓存如何保持一致

缓存是常用的优化数据查询慢的一种方法，数据库出现瓶颈的时候，我们会给服务加上一层缓存，如Redis，命中缓存就不用查询数据库了。减轻数据库压力，提高服务器性能。

数据一致性

引入缓存后，数据出现两份，在数据变更的时候，就需要考虑缓存与数据库的一致性。

由于更新数据库与更新缓存操作是两个步骤，在高并发的场景下，会出现什么问题呢？我们来分析一下。

先更新数据库

如下图所示，高并发场景下存在数据不一致。

先更新缓存

同样也是会出现不一致的场景，如下图所示

所以，无论是先更新数据库还是更新redis，都会存在数据不一致的场景，由于单个操作不是原子操作(并发导致执行数据未知)，也没有事物的支持(一个成功一个失败导致数据不一致)，高并发就会存在不可预知的顺序，导致结果与预期不一致。

既然更新有问题，那缓存直接删除缓存呢？在更新的时候直接删除缓存，查询的时候如果没有缓存就查库，并设置缓存.

如下图所示

读策略步骤

读取缓存，命中直接返回
未命中，读取数据库，并设置缓存

写策略步骤

删除缓存
更新数据库

读取的逻辑比较简单，先读缓存，再读数据库，但写策略删除缓存与更新数据库这两个执行顺序看似无关紧要，谁先谁后都不影响。我们具体分析一下。

先删除缓存

如下图所示，读请求来先查询数据，没查到，这个时候有个更新请求，先删除缓存，之后读请求开始读取数据(数据未更新旧数据) 并将旧数据写入缓存。更新请求更新数据库为新的数据，这时候数据不一致。

先更新数据库

先删除缓存有可能出现不一致的场景，那先更新数据库呢？来跟着我的思路看一下。

同样，一个读请求与一个更新请求，读请求先检查缓存，没数据就从数据库读取数据(这时候还是旧的数据)，在写缓存之前，更新请求更新了数据，并执行了清理缓存的操作，这个时候，读请求的设置缓存操作执行，就出现了不一致。

问题的关键还是非原子操作，无事务支持，导致并发出现未知的执行顺序。

分布式锁

对于比较严格的场景，可以加分布式锁，将更新与删除缓存两步合为一步。也就是，数据更新可以加锁，等更新完成及缓存删除后释放锁，读请求也是加锁，发现有写锁就等待，读锁就继续读。分布式读写锁可以解决并发导致的不一致问题。

延迟双删

针对「先删除缓存，再更新数据库」可以用延迟双删的操作。更新请求在删除缓存后，等待一段时间，再进行一次缓存删除操作，就可以避免缓存中缓存旧数据。

常见问题

在面试的过程中，经常会假想，在操作缓存的时候，网络抖动导致缓存操作失败，这个时候很明显数据也是不一致的。

就比如，更新完数据库，删除缓存的时候失败了，怎么保证一致？

重试

要保证强一致，只能多次删除，异步执行删除，失败后重试几次，一直失败可以增加告警机制配合。

也可以记录失败的key，下次读取的时候避开，总之要保证强一致，大家应该有不少好的方法。

MySQL binlog订阅

比较高级的一种方案，或者说比较复杂，binlog推送数据变更记录，直接删除缓存。

不过，引入一种机制，就会导致系统越来越复杂，这个就看系统的取舍了。

Nginx 负载均衡算法

2021-09-20

nginx 内置变量

内置变量存放在 ngx_http_core_module 模块中，变量的命名方式和 apache 服务器变量是一致的。总而言之，这些变量代表着客户端请求头的内容，例如httpuseragent,http_cookie, 等等。

下面是 nginx 支持的所有内置变量：

$arg_name

这个变量是获取链接中参数名为 name 对应的值；
如请求链接: http://service.shiguofu.cn/test?name=100&a=200
argname=′100′,arg_a = ‘200’

$args

这个变量获取链接中所有的参数，即链接问号后面的所有的东西；
如：http://service.shiguofu.cn/test?name=100&a=200
$args = ‘name=100&a=200’

$binary_remote_addr

客户端的二进制的 ip 地址；

$body_bytes_sent

传输给客户端的字节数，响应头不计算在内；

$bytes_sent

nginx 返回给客户端的字节数，包括响应头和响应体;

$connection

TCP 连接的序列号，并不是一次 http 请求就会更滑一个序列号，http 有 keep-alive 机制，一个序列号会维持

connection_requests

TCP 连接当前的请求数量，服务处理请求的个数，重启后重置为 0

$content_length

“Content-Length” 请求头字段，客户端请求的头部中的 content-length 值；

$content_type

“Content-Type” 请求头字段

获取 cookie 名称为 name 的 cookie 值；
如 cookie：PHP_VERSION: 1.0; NAME:XIAOMING;….
$cookie_NAME = ‘XIAOMING

document_root

当前请求的文档根目录或别名,即配置文件中的 root 目录；

$document_uri

即请求的 uri；
如：http://service.shiguofu.cn/test/index?a=1
$document_uri = /test/index

$host

请求的 host，优先级：HTTP 请求行的主机名 > 请求头中的”HOST”字段 > 符合请求的服务器名

$hostname

请求的服务主机名

$http_name

匹配任意请求头字段；变量名中的后半部分“name”可以替换成任意请求头字段，如在配置文件中需要获取 http 请求头：“Accept-Language”，那么将“－”替换为下划线，大写字母替换为小写，形如：$http_accept_language 即可；

$https

如果开启了 SSL 安全模式，值为“on”，否则为空字符串；

$is_args

如果请求中有参数，值为“?”，否则为空字符串；

$msec

当前的 Unix 时间戳；

$nginx_version

nginx 版本；

$pid

nginx 进程 pid

$pipe

如果请求来自管道通信，值为“p”，否则为“.”

$proxy_protocol_addr

获取代理访问服务器的客户端地址，如果是直接访问，该值为空字符串。有些懵懂；

query_string

链接中的参数列表，同 $args;

$realpath_root

当前请求的文档根目录或别名的真实路径，会将所有符号连接转换为真实路径;

$remote_addr

客户端地址

$remote_port

客户端端口

$remote_user

用于 HTTP 基础认证服务的用户名;

####　$request

HTTP 请求的方法/路径及版本；
如： http://service.shiguofu.cn/test/index
$request = GET /test/index HTTP/1.1

$request_body

客户端的请求主体；post 中的 body 的数据部分

$request_completion

如果请求成功，值为”OK”，如果请求未完成或者请求不是一个范围请求的最后一部分，则为空；

request_filename

当前连接请求的文件路径，由 root 或 alias 指令与 URI 请求生成;

$request_length

请求的长度 (包括请求的地址, http 请求头和请求主体);

$request_method

HTTP 请求方法，通常为“GET”“POST”等

$request_time

处理客户端请求使用的时间; 从读取客户端的第一个字节开始计时；

$request_uri

客户端请求的 uri；
如：http://service.shiguofu.cn/test/index?a=1&b=200
$request_uri = /test/index?a=1&b=200

$scheme

请求使用的 Web 协议, “http” 或 “https”

$sent_http_name

设置任意 http 响应头字段；变量名中的后半部分“name”可以替换成任意响应头字段，如需要设置响应头 Content-length，那么将“－”替换为下划线，大写字母替换为小写，形如：$sent_http_content_length 4096 即可；

$server_addr

服务器端地址；如： 172.27.0.15

$server_name

服务器名；如 service.shiguofu.cn

$server_port

服务器端口号

$server_protocol

服务器的 HTTP 版本, 通常为 “HTTP/1.0” 或 “HTTP/1.1”

$status

HTTP 响应代码

tcpinfortt,tcpinfo_rttvar, tcpinfosndcwnd,tcpinfo_rcv_space

客户端 TCP 连接的具体信息

$time_iso8601

服务器时间的 ISO 8610 格式

$time_local

服务器时间（LOG Format 格式）

$uri

请求中的当前 URI(不带请求参数，参数位于 $args)；

Nginx 负载均衡算法

2021-09-04

Nginx 是一个高性能的 HTTP 和反向代理服务，因它的稳定性、丰富的功能集、示例配置文件和低系统资源的消耗而闻名。

其特点是占有内存少，并发能力强，事实上 nginx 的并发能力确实在同类型的网页服务器中表现较好，中国大陆使用 nginx 网站用户有：百度、京东、新浪、网易、腾讯、淘宝等。

当 Nginx 作为代理服务，后端可支持的应用也是多种类型的，比如基于 python 的 uwsgi、php 的 fastcgi 以及 TCP、HTTP、UDP 等协议；

1 配置 NGINX 代理后端应用

1.1 代理 uwsgi

upstream service {
   server localhost:8888;
   server 192.168.0.2:8889;
   server example.shiguofu.cn:8899;
}

server {
   location /app/service{
      uwsgi_pass service;
      include uwsgi_params;   #uwsgi参数表，在/etc/nginx/目录
   }
}

以上配置表示，主要使用 nginx 的指令 uwsgi_pass，使用 Nginx 的 uwsgi 模块将匹配到 location 的路径转发到有 upstream 块级指令代理的 uwsgi 服务，这里默认是轮询的方式；
所有的 uwsgi 服务在 upstream 中由 server 指令完成，server 指令接收 UNIX 套接字、IP 地址、FQDN 名及一些可选参数，参数下文会提及；

1.2 代理 HTTP

upstream service {
   server localhost:8888;
   server 192.168.0.2:8889;
   server example.shiguofu.cn:8899;
}

server {
   location /app/service{
      proxy_pass http://service;
      include proxy_params;      
   }
}

使用 Nginx 的 porxy_pass 指令，将匹配 location 的路径的请求转发到 upstream 块级指令代理的 HTTP 服务，同样采用轮询的方式；
所有的 HTTP 服务在 upstream 中由 server 指令完成，server 指令接收 UNIX 套接字、IP 地址、FQDN 名及一些可选参数，参数下文会提及；
不同的地方在于 proxy_pass 要加上 http，因为 upstream 并没有指定协议；

1.3 代理 fastcgi 协议

upstream service {
   server localhost:8888;
   server 192.168.0.2:8889;
   server example.shiguofu.cn:8899;
}

server {
   location /app/service{
      fastcgi_pass http://service;
      include fastcgi_params;       #fastcgi参数表，在/etc/nginx/目录
   }
}

使用 Nginx 的 fastcgi_pass 指令，将匹配 location 的路径的请求转发到 upstream 块级指令代理的 HTTP 服务，同样采用轮询的方式；
所有的 fastcgi 服务在 upstream 中由 server 指令完成，server 指令接收 UNIX 套接字、IP 地址、FQDN 名及一些可选参数，参数下文会提及；

1.4 代理 TCP

stream {
   upstream mysql_backend{
      server localhost:3306;
	   server mysql.shiguofu.cn:3306;
   }
   server{
     listen 3307;
     proxy_pass mysql_backend;
   }
}

使用 Nginx 的 stream 块指令，它与 http 指令同一级别，写的时候要注意，在 ubuntu 系统中，http 块写在/etc/nginx/nginx.conf 中；因此笔者当时在/etc/nginx/nginx.conf 中添加的这段配置；

访问服务器的 3307 端口，测试 OK

root@VM-0-15-ubuntu:/etc/nginx# mysql -h 127.0.0.1 -P 3307 -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 27868
Server version: 5.7.23-0ubuntu0.16.04.1-log (Ubuntu)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

mysql>

2 Nginx 负载均衡

Nginx 能够广泛使用，不仅是因为它可以作为代理服务，它还提供了适应于不同业务的负载均衡算法以及判断目标服务的可用性等强大的功能；

2.1 轮询算法

最简单的算法，也是 Nginx 默认的负载均衡算法；

upstream service {
   server localhost:8888 weight=1 max_fails=3 fail_timeout=30;
   server 192.168.0.2:8889 weight=2;
   server tbk.shiguofu.cn:80 backup;
}

server {
   location /app/service{
      proxy_pass http://service;
      include proxy_params;      
   }
}

以上配置是在轮询的基础上，增加了权重的配置，在上面示例中，Nginx 会将三个请求中的两个分发到 8889 端口对应的服务，将另一个请求分发到本地的 8888 端口的服务，并将将 tbk.shiguofu.cn 上的服务作为备用，当分发请求失败会启用备份服务；

使用 Nginx 的指令 weight 指令为轮询的 service 配置权重；
max_fails 与 fail_timtou 为服务的高可用配置；表示在 30 秒内如果有 3 个失败的请求，则认为该服务已经宕掉，在这 30 秒结束之前不会有新的请求会发送到对应的服务上；等这 30 秒结束后，Nginx 会尝试发送一个新的请求到该服务，如果还是失败，则等待 30 秒…以此循环；

2.2 最少连接数

upstream service {
   least_conn;
   server localhost:8888;
   server 192.168.0.2:8889;
   server tbk.shiguofu.cn:80;
}

上面的 least_conn 指令为所负载的应用服务指定采用最少连接数负载均衡；
它会将访问请求分发到 upstream 所代理的服务中，当前打开连接数最少的应用服务器；它同时支持轮询中的 weight、max_fails、fail_timeout 选项，来决定给性能更好的应用服务器分配更多的访问请求；

2.3 最短响应时间

upstream service {
   least_time;
   server localhost:8888;
   server 192.168.0.2:8889;
   server tbk.shiguofu.cn:80;
}

该指令 least_time 仅仅在 NGINX PLUS 版本中支持，不多说。

2.4 散列算法

分为通用散列算法与 ip 散列算法；

upstream service {
   hash $host;
   server localhost:8888;
   server 192.168.0.2:8889;
   server tbk.shiguofu.cn:80;
}

通过 hash 指令实现，根据请求或运行时提供的文本、变量或者其他变量的组合生成散列值；
一般情况，在需要对访问请求进行负载可控，或将访问请求负载到已经有数据缓存的应用服务的场景下，该算法会非常有用；
需要注意的是，在 upstream 中有应用服务的加入或者删除时，会重新计算散列值进行分发；

upstream service {
   ip_hash;
   server localhost:8888;
   server 192.168.0.2:8889;
   server tbk.shiguofu.cn:80;
}

指令 ip_hash 实现，通过计算客服端的 ip 地址来生成散列值。

Goroutine 在项目中的实践

2021-08-03

Goroutine 在项目中的实践

Goroutine 是Golang语言的一大特色，Goroutine的出现，使得并发得到大幅提升。我们一起看下Goroutine在项目中的实践。

Goroutine并发控制

在业务开发中，会碰到几个相互独立的耗时操作，可以并行执行，这个时候Goroutine是很方便派上用场的。如下所示：

// someOperation your work to do
// if we have some data to return use channel to pass data
func someOperation() error {
  time.Sleep(1 * time.Second)
  return nil
}

// anotherOperation
// another work indenpendent with someOperation
func anotherOperation() error {
  time.Sleep(1 * time.Second)
  return nil
}

func bizFunc() error {
  wg := sync.WaitGroup{}  // sync.WatiGroup to sync goroutine
  wg.Add(2) // we have 2 operation to do, so we add 2
  
  go func() {
    err := someOperation()
    if err != nil {
      // whatever handler
    }
    wg.Done()
  }()
  
  go func(){
    err := anotherOperation()
    wg.Done()
  }()
  wg.Wait() // wait all goroutine to return
  // other operation depend on the two before
}

Gorotine 最大个数

上面的案例需要我们知道协程的数量，然后等待所有协程结束，那如果我们不确定协程的个数或者我们需要设置固定个数的协程，该如何做呢？

其实也很简单，利用channel的阻塞特性，创建一个固定长度的channel，创建一个协程，在channel中写入一条数据，当channel被填满后，就会阻塞；协程结束后，从channle中消费一条数据，协程就又可以写入数据，如此可固定协程的数量。

// wrapped for wait group

import (
	"context"
	"sync"
)

const defaultSize = 32

// SizeWaitGroup the struct control limit of waitgroup
type SizeWaitGroup struct {
	buf chan struct{}  // buffer to buf the current number of goroutines
	wg  sync.WaitGroup // the real wait group
}

// NewSizeWaitGroup wait group with limit
func NewSizeWaitGroup(size int) *SizeWaitGroup {
	if size <= 0 {
		size = defaultSize
	}
	return &SizeWaitGroup{
		buf: make(chan struct{}, size), // init the size of channel
		wg:  sync.WaitGroup{},
	}
}

// Add
func (c *SizeWaitGroup) Add() {
	_ = c.AddWithContext(context.Background())
}

// AddWithContext
// blocking if the number of goroutines has been reached
func (c *SizeWaitGroup) AddWithContext(ctx context.Context) error {
  //
	select {
	case <-ctx.Done():   // parent goroutines call canceled or timedout or other happend
		return ctx.Err()
	case c.buf <- struct{}{}: // block if channel is full
		break
	}
	c.wg.Add(1) // we created a goroutine
	return nil
}

// Done
func (c *SizeWaitGroup) Done() {
	<-c.buf  // a goroutine finished
	c.wg.Done()
}

// Wait
func (c *SizeWaitGroup) Wait() {
	c.wg.Wait()
}

如上代码所示，创建一个固定长度的channel，添加协程之前先往队列里增加一个占位符(struct{} 结构不占用内存，协程数量大时不会太占用内存)，然后再调用真正的WaitGroup增加协程控制，执行完成后调用Done方法，从队列中取出占位符调用真正的WaitGroup的Done函数。

调用如下:

swg := NewSizeWaitGroup(128)
for index := 0; index < 1000; index += 1 {
  swg.Add()
  go func() {
    // do what you want
    swg.Done()
  }
}

swg.Wait()

这样，协程的最大数量会保持在128个。

总结

Golang提供的channel与Goroutine 提供很方便的通信与并发功能，在实际的业务开发中，可以很方便讲相互独立的功能并发处理，提高系统的吞吐量。

Mysql模糊查找

2021-07-15

Mysql模糊查找

在业务开发过程中，经常会碰到需要搜索的需求；结合msyql在模糊搜索的时候，很明显会用到like语句，一般情况下，在数据量比较小的时候，按行检索的效率也不是很明显的低效，但当达到百万级，千万级数据量的时候，查询的效率低下是一回事，很可能把数据库拖垮，严重影响可用性。这个时候提高查询效率就显得很重要！

模糊查找

一般情况，我们在查找时候的写法(field肯定是已经建了索引):

1	SELECT `column` FROM `table` WHERE `field` like '%keyword%';

上面的语句用explain解释来看，SQL语句并未使用索引，而是全表扫描，数据量比较大的时候，这效率可想而知。

对比下面的写法：

1	SELECT `column` FROM `table` WHERE `field` like 'keyword%';

这样的写法用explain解释看到，SQL语句使用了索引，搜索的效率大大的提高了。

但是有的时候，我们在做功能需求的时候，并非要想查询的关键词都在开头，所以如果不是特别的要求，”keywork%”并不能适应所有的查找。

所以，我们需要另一种方法。

LOCATE（’substr’,str,start_pos）

Mysql提供LOCATE函数，该方法返回查询字符串在被查询字段下的索引。第一个参数为要查询的字符串，第二个为数据库中的字段名称，第三个代表从字段对应的值的第几个字符串开始查找.

例，有如下表：

CREATE TABLE `meetings` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `des` varchar(225) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

以下sql则查询des字段匹配“hello”的行数。这个“hello” 在des中的可以是开头，可以是结尾，也可以是中间，非常方便。

1	SELECT * from meetings where LOCATE('hello',`des`) > 0;

LOCTATE这个函数有第三个参数，是查找的起始位置，比如可以在上面的sql中加入：

1	SELECT * from meetings where LOCATE('hello',`des`, 5) > 0;

我们使用explain来检查执行是否命中索引，会发现对搜索的字段如果存在索引，确实可以命中。

mysql> explain SELECT * from meetings where LOCATE('hello',`des`) > 0\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: meetings
   partitions: NULL
         type: index
possible_keys: NULL
          key: des_meeting
      key_len: 227
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

如上explain的输出语句，确实使用了索引。

POSITION(‘substr’ IN `field`)

position可以看做是locate的别名，功能跟locate一样。其实个人理解，position只是查找是否包含子串，不能指定位置开始。

1	SELECT * from meetings where POSITION('hello' in `des`);

INSTR(`str`,’substr’)

1	SELECT `column` FROM `table` WHERE INSTR(`field`, 'keyword' )>0

这个INSTR也是子串判断

FIND_IN_SET

FIND_IN_SET(str1,str2)

返回str2中str1所在的位置索引，其中str2必须以”,”分割开。

1	SELECT * FROM person WHERE FIND_IN_SET('apply',name);

name的内容是以逗号分隔的，如

1	apple,pear,orange

就可以使用FIND_IN_SET

1	select * from `table` where FIND_IN_SET('apple', name);

个人觉得这个主要是针对sql数组数据的查找; 数组数据以逗号分隔存储到一个字段，FIND_IN_SET可以快速找到包含数组元素的行。

总结

Mysql在进行模糊查找需要注意，前缀匹配的时候会用到索引，前后都模糊，则无法使用索引；

可以使用Mysql提供的函数来“曲线救国”来命中索引。

Grpc拦截器

2021-04-14

gRPC 拦截器

服务端拦截器

grpc接口服务端拦截器，一般用来做一些预处理及后处理操作。如下，举两个常用的例子。

在微服务之间使用gRPC互相调用的时候，会传入一些公共的与业务不相关的元数据，这些数据就很适合在拦截器中实现。

如下服务端的拦截器将gRPC client传入的数据放入gRPC的context中，接口中就可以使用ctx.Value去获取该数据。

// MetaDataInterceptor get grpc server info, requestId/traceId/LogId
func MetaServerDataInterceptor() grpc.UnaryServerInterceptor {
  // 拦截器函数签名
  // @params ctx Grpc context
  // @params req grpc request
  // @params info grpc request info
  // @params handler the grpc method
   return func(ctx context.Context,
      req interface{},
      info *grpc.UnaryServerInfo,
      handler grpc.UnaryHandler) (resp interface{}, err error) {
     
     // do what you want to do
     // get metadata from grpc client
     md, ok := metadata.FromIncomingContext(ctx)
	   if !ok {
		     md = metadata.Pairs()
	   }
	   // Set request info for context.
     // define your key
     for _, key := range []string{"requestId"} {
		    value := md.Get(key)
		    // ignore it if not exists.
		    if len(value) >= 1 {
            // set value to context. you can use ctx.Value to get it from your grpc method
			      ctx = context.WithValue(ctx, key, value[0])
		    }
     }
     // next 
      return handler(ctx, req)
   }
}

在实际的环境中，经常会需要在gRPC 接口之前之后做一些处理。比如，在开始之前记录时间，执行之后记录耗时操作；执行之后判断执行结果等等

如下所示，实现了一个记录接口耗时功能的拦截器，当然实际不会这么low。

// API time elas time get grpc server info
func APITimeInterceptor() grpc.UnaryServerInterceptor {
  // 拦截器签名
   return func(ctx context.Context,
      req interface{},
      info *grpc.UnaryServerInfo,
      handler grpc.UnaryHandler) (resp interface{}, err error) {
     
     // do what you want to do
     start := time.Now().UnixNano()
     // do gRPC method
     ret := handler(ctx, req)
     // do what you want after the grpc method
     fmt.Println(time.Now().UnixNano() - start)
     return ret
   }
}

服务端流式接口拦截器

在golang的gRPC中，普通接口与stream接口的拦截器，需要分别实现。以上的拦截器只用于非stream的接口，对于stream接口，以上拦截器是不生效的。
流式拦截器函数签名如下：

1	type StreamServerInterceptor func(srv interface{}, ss ServerStream, info *StreamServerInfo, handler StreamHandler) error

查看流式拦截器可知，stream的context是在ServerStream中的，因此stream 要传递context 需要继承ServerStream并覆盖context。如下所示

// WrappedStream wraps around the embedded grpc.ServerStream, and intercepts the Context
type WrappedStream struct {
	grpc.ServerStream // serverStream interface
	Ctx *context.Context // 定义ctx，覆盖ServerStream中的context
}

// Context override the context method and can config the context manually
func (c WrappedStream) Context() context.Context {
	return *c.Ctx
}

// NewWrappedStream wrapper the grpc.ServerStream
func NewWrappedStream(s grpc.ServerStream, ctx *context.Context) grpc.ServerStream {
	wrapper := &WrappedStream{
		ServerStream: s,
		Ctx:          ctx,
	}
	stream := grpc.ServerStream(wrapper)
	return stream
}

实现该封装之后，就可以将上层的context获取并将元数据写入context后，调用NewWrappedStream传入gRPC的接口调用中。如下所示

流式拦截器实现元数据的传递

// stream method to get meta data
func MetaStreamServerInterceptor() grpc.StreamServerInterceptor {
  // 函数签名
	return func(
		srv interface{}, ss grpc.ServerStream, info *grpc.StreamServerInfo, handler grpc.StreamHandler) error {
     // 获取当前 grpc context
     ctx := ss.Context()
     md, ok := metadata.FromIncomingContext(ctx)
	   if !ok {
		     md = metadata.Pairs()
	   }
	   // Set request info for context.
     // define your key
     for _, key := range []string{"requestId"} {
		    value := md.Get(key)
		    // ignore it if not exists.
		    if len(value) >= 1 {
            // set value to context. you can use ctx.Value to get it from your grpc method
			      ctx = context.WithValue(ctx, key, value[0])
		    }
     }
    // set context to next 
		return handler(srv, streaminterceptor.NewWrappedStream(ss, &ctx))
	}
}

gRPC客户端拦截器

gRPC客户端拦截器是在调用gRPC接口之前与之后执行的操作。比如，元数据需要在请求接口之前塞入到metaData中(http2.0Header)，才会传递到gRPC的服务端。
如下，将当前接口context中的数据放入header中传入服务端。

// request grpc service with requestId/traceId info.
func MetaClientDataInterceptor() grpc.UnaryClientInterceptor {
  // 函数签名
	return func(
		ctx context.Context,
		method string, req, resp interface{},
		cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption,
	) (err error) {
	  // 获取当前header数据，没有则新建一个
		md, ok := metadata.FromOutgoingContext(ctx)
		if !ok {
			md = metadata.Pairs()
		}
		for _, key := range keyNames {
			value := ctx.Value(key)
			if strValue, ok := value.(string); ok && strValue != "" {
			  
				md.Set(key, strValue)
			}
		}
		// 将header写入
		ctx = metadata.NewOutgoingContext(ctx, md)
		// 执行调用
		return invoker(ctx, method, req, resp, cc, opts...)
	}
}

流式客户端拦截器

Stream client的实现也是比较简单的，与服务端不同的是，客户端的流式拦截器不需要封装一层，可以直接使用。
如下，同样实现了元数据传递到服务端的拦截器。

// MetaStreamClientInterceptor get grpc client info, requestId/traceId/LogId for grpc stream server
func MetaStreamClientInterceptor() grpc.StreamClientInterceptor {
  // 函数签名
	return func(ctx context.Context, desc *grpc.StreamDesc, cc *grpc.ClientConn, method string, streamer grpc.Streamer,
		opts ...grpc.CallOption) (grpc.ClientStream, error) {

    // 从context获取元数据
		md, ok := metadata.FromOutgoingContext(ctx)
		if !ok {
			md = metadata.Pairs()
		}
		for _, key := range keyNames {
			value := ctx.Value(key)
			if strValue, ok := value.(string); ok && strValue != "" {
				md.Set(key, strValue)
			}
		}
		// set metadata to ctx
		ctx = metadata.NewOutgoingContext(ctx, md)

		clientStream, err := streamer(ctx, desc, cc, method, opts...)

		return clientStream, err
	}
}

总结

拦截器为gRPC的服务端/客户端复用公共模块提供了一种很简单方便的方法，只需要实现对应的拦截器函数，在服务端启动或者客户端连接的时候作为选项传入即可(自行搜索)。
需要注意的是，在Golang中，拦截器分为普通接口与流式接口的拦截器，需要分别实现。

流式服务端拦截器

gRPC中拦截器流式接口拦截器需要实现如下签名的函数，有兴趣可深入了解下。例子如上所示

// StreamServerInterceptor provides a hook to intercept the execution of a streaming RPC on the server.
// info contains all the information of this RPC the interceptor can operate on. And handler is the
// service method implementation. It is the responsibility of the interceptor to invoke handler to
// complete the RPC.
func(srv interface{}, ss grpc.ServerStream, info *grpc.StreamServerInfo, handler grpc.StreamHandler) error

注意streamServer拦截器如果需要传递context，需要将ServerStream进行封装，覆盖Context 函数

普通服务端拦截器

普通方法的拦截器实现比较简单，实现如下签名函数

// @params ctx: grpc context
// @params req: the request params
// @params info: the grpc request info
// @params handler: the real grpc method
func(ctx context.Context,
		req interface{},
		info *grpc.UnaryServerInfo,
		handler grpc.UnaryHandler) (resp interface{}, err error)

客户端普通拦截器

golang在调用grpc之前执行的公共的操作，比如要把requestId塞到header中。

// @params method: the RPC name
// @params req: the request
// @params resp: the response
// @params cc: the ClientConn on which the RPC was invoked
// @params invoker: the invoker of grpc methor
// @params opts: the option
func(
		ctx context.Context,
		method string, req, resp interface{},
		cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption,
	)

客户端流失拦截器

实现如下签名的函数即可

// @params desc: contains a description of the stream
// @params cc: the ClientConn on which the RPC was invoked
// @params method: the RPC name
// @params streamer: the handler to create a ClientStream and it is the responsibility of the interceptor to call it
// @params opts: the option
func(ctx context.Context, desc *StreamDesc, cc *ClientConn, method string, streamer Streamer, opts ...CallOption) (ClientStream, error)

以上即为Golang的拦截器实现，可以分为服务端与客户端的拦截器，两端分别有流式拦截器与普通接口拦截器，在使用的时候可根据自己的业务需求实现。

初识ElasticSearch

2021-01-26

What is ElasticSearch

先搬一个官网的定义。

Elasticsearch is a real-time, distributed storage, search, and analytics engine

Elasticsearch 是一个实时的分布式存储、搜索、分析的引擎。

要想了解它是什么，首先得看他能干什么，概念很清晰：分布式存储/搜索/分析引擎。

看这些概念，咋一看，数据库也都可以做到。

分布式存储 - 数据库也可以有主从集群模式
搜索 - 数据库也可以用like %% 来查找

的确，这样做的确可以， mysql也支持全文检索。但是有个问题： like %% 是不走索引的，这就意味着：数据量非常大的时候，我们的查询肯定是秒级的。

我还想提一个概念： 全文检索

类似搜索引擎，输入往往是多种多样的，不同的人有不同的表达方式，但实际都是一个含义，数据库的准确性不高，效率低下，高并发下，数据库会被拖垮。

ElasticSearch 是专门做搜索的，就是为了在理解用户输入语义并高效搜索匹配度高的文档记录。

Elasticsearch基本概念

近实时(NRT)

ElasticSearch是基于Lucene库的，Lucene数据只有刷新到磁盘，才可以被检索到，内存缓存中的数据只有刷新到磁盘才可以被检索。ElasticSearch默认是每秒刷新一次，也就是文档的变化会在一秒之后可见。因此近实时搜索。也可根据自己的需求设置刷新频率。

A Lucene index with new documents in the in-memory buffer

集群(Cluster)

海量数据单机无法存储，就需要使用集群，将多个节点组织在一起，共同维护所有数据，共同提供索引和搜索功能。

节点(node)

一个节点就是集群中的一个服务器，存储部分数据，参与索引与搜索。

分片(shards & replicas)

一个索引可以存储超出单个结点硬件限制的大量数据，为了解决这个问题，Elasticsearch提供了将索引划分成多份的能力，这些份就叫做分片。为保证单点故障，一个分片会保存不止一份，可分为一个主分片(primary shard)与多个*复制分片(replica shard) *，复制分片的数量可动态调整，复制分片也可用来提升系统的读性能。

文档(Document)

一个文档是一个可被索引的基础信息单元。文档以JSON（Javascript Object Notation）格式来表示。

索引(index)

一个索引就是一个拥有几分相似特征的文档的集合。

索引类型(type)

索引类型是在一个索引中，不同类型的数据类型。一条文档中有(type)字段用来区分索引类型，es7.x以上取消同一个索引中存在不同索引类型的数据，也就是说，(_type)字段固定，默认为_doc。

如下，在7.x之前的ES可以在一个索引中创建不同索引类型的数据:

1	curl -XPOST localhost:9200/indexname/typename -H 'Content-Type:application/json' -d '{"data": 1234}'

ElasticSearch RestFul API

ES对外提供RestFul API来读写集群，设置集群，获取集群状态操作。

集群状态API

集群状态

GET /_cluster/health

curl http://localhost:9200/_cluster/health --user xx:xxxx
{
  "cluster_name" : "es",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 1216,
  "active_shards" : 2432,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

集群节点列表

curl http://localhost:9200/_cat/nodes?v --user xxx:xxxx
ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role   master name
9.135.145.82            25          92   0    0.10    0.13     0.21 cdfhilmrstw *      es-node0
9.135.91.111            21          99   0    0.01    0.07     0.08 cdfhilmrstw -      es-node1
9.135.170.150           48          36   2    0.38    0.33     0.26 cdfhilmrstw -      es-node2

集群健康状态

结果与_cluster/health一致

curl --user elastic:4j243cNvO1770iCs http://10.1.1.45:9200/_cat/health?v

epoch      timestamp cluster     status node.total node.data shards  pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1620725415 09:30:15  es-nnx25yd7 green          11         8   4632 2316    0    0        0             0                  -                100.0%

节点分配资源状态

curl --user elastic:4j243cNvO1770iCs http://10.1.1.45:9200/_cat/allocation?v

shards    disk.indices      disk.used      disk.avail      disk.total      disk.percent      　　host              　　 ip            　　　　node
　9 　　 38.8mb 　　　　 9.1gb 　　　 8.6gb 　　 17.7gb 　　　　 51    　　192.168.2.114 　　 192.168.2.114 　　   node-1
　9 　　 38.8mb 　　　　 4.7gb 　　　 13gb 　　  17.7gb 　　　　 26 　　   192.168.2.116 　　 192.168.2.116 　　   node-2

索引文档操作

索引列表

1	curl http://localhost:9200/_cat/indices?pretty --user xx:xxxx

查看索引的设置

1	curl http://localhost:9200/[index_name]/_settings

查看索引映射

1	curl http://localhost:9200/[index_name]/_mapping --user xx:xxx

创建索引

curl -H "Content-Type: application/json" -XPUT localhost:9200/blogs -d '
{
    "settings": {
        "number_of_shards": 3,    # 主分片
        "number_of_replicas": 1   # 副本分片
    }
}'

主分片在索引创建以后就固定了，不可更改，如要修改可重建索引，将数据reindex过去；

副本分片最大值是 n-1(n为节点个数)，复制分片可随时修改个数
1
2
3
4
5
> curl -H "Content-Type: application/json" -XPUT localhost:9200/blogs/_settings -d '
> {
>     "number_of_replicas": 2
> }'
>

reIndex操作

curl -H "Content-Type: application/json" -XPOST localhost:9200/_reindex -d '
{
    "source": {
        "index": "accesslog"
    },
    "dest": {
        "index": "newlog"
    }  
}'

删除索引

1	curl -H "Content-Type: application/json" -XDELETE localhost:9200/[indexname]

查询文档操作

1	POST http://localhost:9200/indexname/_search

查看所有

1	curl -XPOST http://localhost:9200/indexname/_search -H "Content-Type:application/json" -d '{"query":{"match_all":{} } }'

精确匹配（price=549的数据）

1	curl -XPOST http://localhost:9200/indexname/_search -H "Content-Type:application/json" -d '{"query":{"constant_score":{"filter":{"term":{"price":549} } } } }'

term query(title=”java”)

1	curl -XPOST http://localhost:9200/indexname/_search -H "Content-Type:application/json" -d '{"query":{"term":{"title":"java"} } }'

分词查询

1	curl -XPOST http://localhost:9200/indexname/_search -H "Content-Type:application/json" -d '{"query":{"match":{"title":"Core Java"} } }'

分词查询(全匹配)

1 2	curl -XPOST http://localhost:9200/indexname/_search -H "Content-Type:application/json" -d '{"query":{"match":{"title":{"query":"Core Java", "operator":"and"} } } }'

索引模板

dynamic template

"dynamic_templates": [
    {
      "my_template_name": { 
        ...  match conditions ... 
        "mapping": { ... }    # match field use mappings
      }
    },
    ...
  ]
# The match conditions can include any of : match_mapping_type, match, match_pattern, unmatch, path_match, path_unmatch.

match_mapping_type

put myIndex 
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        {
          "integers": {       # template name
            "match_mapping_type": "long",   # all fileld value long type
            "mapping": {
              "type": "integer"      # recognate it as integer
            }
          }
        },
        {
          "string_not_analyzed": {
            "match_mapping_type": "string",   # match all string filed
            "mapping": {
              "type": "string",
              "fields": {
                "raw": {
                  "type":  "string",
                  "index": "not_analyzed",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ]
    }
  }
}

match and unmatch

match和unmatch定义应用于filedname的pattern。

定义一个匹配所有以long_开头且不以_text结束的string类型的模板

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        {
          "longs_as_strings": {
            "match_mapping_type": "string",
            "match":   "long_*",
            "unmatch": "*_text",
            "mapping": {
              "type": "long"
            }
          }
        }
      ]
    }
  }
}

example

curl -XPOST http://10.1.1.12:9200/_template/default@template --user elastic:b6fBNAapGEcYz2dt -H "Content-Type:application/json" -d '{
    "order" : 1,
    "index_patterns" : [
      "*"
    ],
    "settings" : {
      "index" : {
        "max_result_window" : "65536",
        "refresh_interval" : "30s",
        "unassigned" : {
          "node_left" : {
            "delayed_timeout" : "5m"
          }
        },
        "translog" : {
          "sync_interval" : "5s",
          "durability" : "async"
        },
        "number_of_replicas" : "1"
      }
    },
    "mappings" : {
      "dynamic_templates" : [
        {
          "message_full" : {
            "mapping" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "ignore_above" : 2048,
                  "type" : "keyword"
                }
              }
            },
            "match" : "message_full"
          }
        },
        {
          "msg" : {
            "mapping" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "ignore_above" : 2048,
                  "type" : "keyword"
                }
              }
            },
            "match_pattern": "regex",
            "match" : "msg|pl_message|json"
          }
        },
        {
          "payload_data" : {
            "mapping" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "ignore_above" : 2048,
                  "type" : "keyword"
                }
              }
            },
            "match" : "*payload"
          }
        },
        {
          "message" : {
            "mapping" : {
              "type" : "text"
            },
            "match" : "message"
          }
        },
        {
          "strings" : {
            "mapping" : {
              "type" : "keyword"
            },
            "match_mapping_type" : "string"
          }
        }
      ]
    },
    "aliases" : { }
  }'

快照

# register a snapshot repository
PUT /_snapshot/my_fs_backup
{
    "type": "fs",
    "settings": {
        "location": "/opt/backup_es",
        "compress": true
    }
}

location:my_fs_backup_location 路径必须先在elasticsearch.yaml中配置path.repo

1	path.repo: /opt/backup_es


`location`	Location of the snapshots. Mandatory.
`compress`	Turns on compression of the snapshot files. Compression is applied only to metadata files (index mapping and settings). Data files are not compressed. Defaults to `true`.
`chunk_size`	Big files can be broken down into chunks during snapshotting if needed. Specify the chunk size as a value and unit, for example: `1GB`, `10MB`, `5KB`, `500B`. Defaults to `null` (unlimited chunk size).
`max_restore_bytes_per_sec`	Throttles per node restore rate. Defaults to `40mb` per second.
`max_snapshot_bytes_per_sec`	Throttles per node snapshot rate. Defaults to `40mb` per second.
`readonly`	Makes repository read-only. Defaults to `false`.

快照策略

SLM

elastic设置密码

elasticsearch.yml增加如下配置

1
2
3

xpack.security.enabled: true
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: true

重新启动es，执行

1	bin/elasticsearch-setup-passwords interactive

这里需要为4个用户分别设置密码，elastic, kibana, logstash_system,beats_system，交互输入密码。

修改密码：

1	curl -H "Content-Type:application/json" -XPOST -u elastic 'http://127.0.0.1:9200/_xpack/security/user/elastic/_password' -d '{ "password" : "123456" }'

索引选项

index.refresh_interval

数据索引后并不会马上搜索到，需要刷新后才能被搜索的，这个选项设置索引后多久会被搜索到。

index.translog

sync_interval
durability

Why yellow

多数据节点故障
为索引使用损坏的或红色的分区
高 JVM 内存压力或 CPU 利用率
磁盘空间不足

Fix yellow

列出未分配的分区

1	curl -XGET 'localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason' \| grep UNASSIGNED

输出：

xxxxx                             0 r UNASSIGNED INDEX_CREATED
yyyyy                             0 r UNASSIGNED INDEX_CREATED
zzzzz              								0 r UNASSIGNED INDEX_CREATED
rrrrr										          0 r UNASSIGNED INDEX_CREATED

展示出所有未分配的分片的列表

检索为什么未分配

1	curl -XGET 'localhost:9200/_cluster/allocation/explain?pretty' -H 'Content-Type:application/json' -d'{"index": "xxxxx", "shard": 0, "primary":false}'

输出：(未记录输出)

会给出集群中所有节点不能分配的原因。

解决

如果是磁盘空间不足，删除不必要的索引。对于其他原因，可根据情况解决不能分配的原因。比如下面几个常见的原因。

a. cluster.max_shards_per_node默认为1000，节点分片已经达到最大。

b. 磁盘空间达到配置的阈值，比如磁盘已经达到80%，不会继续分配分片。

c. 分片设置的节点必须是hot节点。

可通过如下接口查看当前磁盘分配配置：

1	curl -XGET _cluster/settings?include_defaults=true&flat_settings=true&pretty

输出(输出太多截取一部分)：

{
  "persistent" : {
    "cluster.routing.allocation.disk.watermark.flood_stage" : "95%",
    "cluster.routing.allocation.disk.watermark.high" : "90%",
    "cluster.routing.allocation.disk.watermark.low" : "85%"
  },
  "transient" : {
    "cluster.max_shards_per_node" : "10000",
    "cluster.routing.allocation.disk.watermark.flood_stage" : "95%",
    "cluster.routing.allocation.disk.watermark.high" : "90%",
    "cluster.routing.allocation.disk.watermark.low" : "85%"
  },
  .....

索引生存周期(ILM)

适用于单索引并不断增长，可设置ILM rollover，根据大小或者文档条数拆分.

对于按天索引，可配置删除阶段规则.

创建ILM策略(hot/warm/cold/delete)
创建索引模板，指定ILM的范围
创建rollover的索引，名称末尾要是数字，这样rollover就会+1，如：carlshi-00001;配置is_write_index选项
原索引写入数据

For Example:

# 创建索引模板
PUT /_template/carl_template
{
  "index_patterns": [  # 匹配的索引名称
    "carl-*"
  ],
  "settings": {
      "refresh_interval": "30s",
      "number_of_shards": "1",
      "number_of_replicas": "0"
  },
  "mappings": {   # mapping
    "properties": {
      "name": {
        "type": "keyword"
      }
    }
  }
}

创建索引：

# 创建第一个索引
PUT /carlshi-000001
{
  "aliases": {
    "carlshi-index": {        # 索引alias，写入carlshi-index的都会写入carlshi-00001
      "is_write_index": true
    }
  }
}

elasticsearch docker

直接运行elasticsearch，会自动拉去镜像并执行；

1	docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.5.1 -v /usr/share/elasticsearch/data:/usr/share/elasticsearch/data

运行成功后，执行curl，获取基本信息

curl localhost:9200
{
  "name" : "be856c56d8bd",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "bsnwunE2SnWcBIqoxbgnUw",
  "version" : {
    "number" : "7.5.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96",
    "build_date" : "2019-12-16T22:57:37.835892Z",
    "build_snapshot" : false,
    "lucene_version" : "8.3.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

小结

ElasticSearch是一款强大的全文检索工具，他提供REST API使得使用ElasticSearch非常简单，对数据做了很强的高可用，也可根据自己的需求配置不同级别的高可用、高性能全文检索工具。

本篇主要讲解对ElasticSearch的常用模块做了简单的介绍，索引的基本属性基本操作(增删改查)，动态索引模式模板，快照备份，索引生存周期；还记录了集群黄色的排查方向。以后逐步深入各个模块的配置甚至内部实现原理。

背景

01背包代码

转移方程

结语

题目

示例

暴力解法

回溯的位置待查找

Typora

安装使用

数据库与缓存如何保持一致

数据一致性

常见问题

nginx 内置变量

$arg_name

$args

$binary_remote_addr

$body_bytes_sent

$bytes_sent

$connection

connection_requests

$content_length

$content_type

$cookie_name

document_root

$document_uri

$host

$hostname

$http_name

$https

$is_args

$msec

$nginx_version

$pid

$pipe

$proxy_protocol_addr

query_string

$realpath_root

$remote_addr

$remote_port

$remote_user

$request_body

$request_completion

request_filename

$request_length

$request_method

$request_time

$request_uri

$scheme

$sent_http_name

$server_addr

$server_name

$server_port

$server_protocol

$status

tcpinfortt,tcpinfo_rttvar, tcpinfosndcwnd,tcpinfo_rcv_space

$time_iso8601

$time_local

$uri

1 配置 NGINX 代理后端应用

1.1 代理 uwsgi

1.2 代理 HTTP

1.3 代理 fastcgi 协议

1.4 代理 TCP

2 Nginx 负载均衡

2.1 轮询算法

2.2 最少连接数

2.3 最短响应时间

2.4 散列算法

Goroutine 在项目中的实践

Goroutine并发控制

Gorotine 最大个数

总结

Mysql模糊查找

模糊查找

LOCATE（’substr’,str,start_pos）

POSITION(‘substr’ IN field)

INSTR(str,’substr’)

FIND_IN_SET

总结

POSITION(‘substr’ IN `field`)

INSTR(`str`,’substr’)