Lolizeppelin's Blog

Linux ip命令的使用以及路由基础知识

Posted on By gcy

路由选择 文档地址

Crucial to the proper ability of hosts to exchange IP packets is the correct
selection of a route to the destination. The rules for the selection of route path are
traditionally made on a hop-by-hop basis [18] based solely upon the destination address of
the packet. Linux behaves as a conventional routing device in this way,
but can also provide a more flexible capability. Routes can be chosen and
prioritized based on other packet characteristics.

The route selection algorithm under linux has been generalized to enable
the powerful latter scenario without complicating the overwhelmingly common case of the former scenario.
除非复杂度远超普通情形, 否则linux下的路由选择算发(已经被认可)在常见情况下远远好于以往的方式

The above sections on routing to a local network and the default gateway
expose the importance of destination address for route selection.
In this simplified model, the kernel need only know the destination address of the packet,
which it compares against the routing tables to determine the route by which to send the packet.

The kernel searches for a matching entry for the destination first in the routing cache
and then the main routing table. In the case that the machine has recently transmitted
a packet to the destination address, the routing cache will contain an entry for the destination.
The kernel will select the same route, and transmit the packet accordingly.
内核将选择缓存中返回的路由(不去查询rule、和route table),并相应地传输网络包。

If the linux machine has not recently transmitted a packet to this destination address,
it will look up the destination in its routing table using a technique known longest prefix match [19].
In practical terms, the concept of longest prefix match means that the most specific route to the destination will be chosen.
if packet.routeCacheLookupKey in routeCache :
    route = routeCache[ packet.routeCacheLookupKey ]
    for rule in rpdb :
        if packet.rpdbLookupKey in rule :
            routeTable = rule[ lookupTable ]
            if packet.routeLookupKey in routeTable :
                route = route_table[ packet.routeLookup_key ]
This pseudocode provides some explanation of the decisions required to find a route.
The final piece of information required to understand the decision
making process is the lookup process for each of the three hash table lookups.
In Table 4.1, “Keys used for hash table lookups during route selection”,
each key is listed in order of importance.
Optional keys are listed in italics and represent keys that will be matched if they are present.

Table 4.1. Keys used for hash table lookups during route selection

route cache RPDB route table(kernel)
destination source destination
source destination ToS
ToS ToS scope
fwmark fwmark oif
iif iif null

ip link文档地址

The ip link tool provides the following two verbs: ip link show and ip link set.

B.3.1. Displaying link layer characteristics with ip link show
通过ip link show显示链路层(OSI 二层, 802.2、802.3ATM、HDLC、FRAME RELAY)属性

To display link layer information, ip link show will fetch characteristics of the link layer devices currently available.
Any networking device which has a driver loaded can be classified as an available device.
It is immaterial to ip link whether the device is in use by any higher layer protocols (e.g., IP).
You can specify which device you want to know more about with the dev <interface> option.
为了显示链路层信息,  ip link show会刷新当前可达的链路层设备的属性
对于ip link来说,设备是否已经被高层的协议(例如 IP协议)所使用是无关紧要的
你可以指定通过指定dev <interface>参数的方式来获取你想要的设备的更多信息

scope是什么 相关文档地址

Scope is normally determined by the ip utility without explicit use on the command line.
For example, an IP address in the range falls in the range of localhost IPs,
so should not be routed out any device. This explains the presence of
the host scope for addresses bound to interface lo. Usually,
addresses on other interfaces are public interfaces, which means
that their scope will be global. We will revisit scope again
when we discuss routing with ip route, and there we will also encounter the link scope.
scope通常在ip的命令中不被显式的指定,例如, ip段中的IP都是本机ip。
因此不需要路由到任何设备,local表中的那条scope host到lo接口的那条记录就是一个列子
通常来说, 地址所在的其他接口(lo以外接口)的是公共接口,也就是说他们的scope是全局的.当我们使用ip route讨论路由时
我们将再次使用到scope,那时候我们还会遇到并讨论link scope

man 8 ip
       the scope of the destinations covered by the route prefix.  SCOPE_VAL may be a number or a  string  from
       the  file  /etc/iproute2/rt_scopes.  If this parameter is omitted, ip assumes scope global for all gate-
       wayed unicast routes, scope link for direct unicast and  broadcast  routes  and  scope  host  for  local
       被路由前缀(子网掩码)覆盖的目的地的scope, SCOPE_VAL可以是/etc/iproute2/rt_scopes包含的字符串
       如果这个参数没有被指定, ip(ip这个cmd)将假定作用范围是所有设置了gatewary的单播路由
       scope link 用于确定的单播和多播路由
       scope host 用于本地(回环)路由

[root@second ~]# cat /etc/iproute2/rt_scopes
0       global
255     nowhere
254     host
253     link
# pseudo-reserved
200     site


Scope Description  
global valid everywhere 全局有效
site valid only within this site (IPv6) IPV6 地址有效
link valid only on this device 当前设备有效
host valid only inside this host (machine) 当前机器有效(这个一般出现在local路由表里)


ip route - routing table management
           # 路由表管理

       Manipulate route entries in the kernel routing tables keep information about paths to other networked nodes.
       操作 (保存了联通其他网络节点路径的信息的)kernel路由表 中的(具体)路由条目.(keep前有一个which比较好理解)

       Route types:

               unicast - the route entry describes real paths to the destinations covered by the route prefix.
               单播       路由条目描述了到达目的地的真实路径,目的地址被路由前缀(掩码)覆盖过

               unreachable - these destinations are unreachable. Packets are  discarded  and  the  ICMP  message  host
               unreachable is generated.  The local senders get an EHOSTUNREACH error.

               blackhole  - these destinations are unreachable. Packets are discarded silently.  The local senders get
               an EINVAL error.

               prohibit - these destinations are unreachable. Packets are discarded and the ICMP message communication
               administratively prohibited is generated. The local senders get an EACCES error.

               local  - the destinations are assigned to this host. The packets are looped back and delivered locally.

               broadcast - the destinations are broadcast addresses. The packets are sent as link broadcasts.

               throw - a special control route used together with policy rules. If such a route is selected, lookup in
               this table is terminated pretending that no route was found. Without policy routing it is equivalent to
               the absence of the route in the routing table. The  packets  are  dropped  and  the  ICMP  message  net
               unreachable is generated. The local senders get an ENETUNREACH error.

               nat  - a special NAT route. Destinations covered by the prefix are considered to be dummy (or external)
               addresses which require translation to real (or internal) ones  before  forwarding.  The  addresses  to
               translate  to  are selected with the attribute via.  Warning: Route NAT is no longer supported in Linux

               anycast - not implemented the destinations are anycast addresses assigned to this host. They are mainly
               equivalent  to local with one difference: such addresses are invalid when used as the source address of
               any packet.

               multicast - a special type used for multicast routing. It is not present in normal routing tables.

      Route tables: Linux-2.x can pack routes into several routing tables identified by a number in the range from  1
      to 255 or by name from the file /etc/iproute2/rt_tables By default all normal routes are inserted into the main
      table (ID 254) and the kernel only uses this table when calculating routes.

      Actually, one other table always exists, which is invisible but even more important. It is the local table  (ID
      255). This table consists of routes for local and broadcast addresses. The kernel maintains this table automat-
      ically and the administrator usually need not modify it or even look at it.

      The multiple routing tables enter the game when policy routing is used.
      当策略路由使用的时候,多个路由表将加入"游戏",具体使用那个路由表,参考后面的ip rule
      在没有设置过ip rule的情况下,所有网络包都先走一遍local表,在local表被没匹配就走main表(在走local表之前会先查询路由缓存)
      我们直接运行route和ip route显示的就是main表的内容

      [root@second ~]# cat /etc/iproute2/rt_tables
      255     local
      254     main
      253     default
      0       unspec


protocol RTPROTO
       the  routing  protocol  identifier  of  this  route.   
       RTPROTO may be a number or a string from the file /etc/iproute2/rt_protos.  
       If the routing protocol ID is not given, ip assumes  protocol  boot  
       (i.e.  it assumes  the  route  was  added by someone
       who doesn’t understand what they are doing).
       Several protocol values have a fixed interpretation.  Namely:
       路由的协议识标符,RTPROTO可能是一个数字或者string,具体查询 /etc/iproute2/rt_protos

               redirect - the route was installed due to an ICMP redirect.

               kernel - the route was installed by the kernel during autoconfiguration.

               boot - the route was installed during the bootup sequence.  
               If a routing daemon starts, it  will purge all of them.

               static  - the route was installed by the administrator to override dynamic routing.
               Routing daemon will respect them and, probably, even advertise them to its peers.

               ra - the route was installed by Router Discovery protocol.

       The rest of the values are not reserved and the administrator
       is free to assign (or not to assign)  protocol tags.

       [root@second ~]# cat /etc/iproute2/rt_protos
       # Reserved protocols.
       0       unspec
       1       redirect
       2       kernel
       3       boot
       4       static
       8       gated
       9       ra
       10      mrt
       11      zebra
       12      bird
       13      dnrouted
       14      xorp
       15      ntk
       16      dhcp

ip rule是什么

ip rule - routing policy database management

       Rules in the routing policy database control the route selection algorithm.

       Classic routing algorithms used in the Internet make routing decisions based only on the destination address of
       packets (and in theory, but not in practice, on the TOS field).

       In  some  circumstances  we  want to route packets differently depending not only on destination addresses, but
       also on other packet fields: source address, IP protocol, transport protocol  ports  or  even  packet  payload.
       This task is called ’policy routing’.
       而且取决于网络包的其他属性,例如:源地址,IP协议,传输协议端口或甚至网络包的有效载荷。 这被称为“策略路由”。

       To  solve  this  task, the conventional destination based routing table, ordered according to the longest match
       rule, is replaced with a ’routing policy database’ (or RPDB), which selects routes by  executing  some  set  of

       Each  policy  routing  rule  consists  of  a selector and an action predicate.  The RPDB is scanned in order of
       decreasing priority. The selector of each rule is applied to {source  address,  destination  address,  incoming
       interface,  tos, fwmark} and, if the selector matches the packet, the action is performed. The action predicate
       may return with success.  In this case, it will either give a route or failure indication and the  RPDB  lookup
       is terminated. Otherwise, the RPDB program continues with the next rule.
       每条策略路由规则包含了两个对象, 匹配对象、动作行为

       Semantically, the natural action is to select the nexthop and the output device.

       At startup time the kernel configures the default RPDB consisting of three rules:

       1.     Priority:  0, Selector: match anything, Action: lookup routing table local (ID 255).  The local table is
              a special routing table containing high priority control routes for local and broadcast addresses.
              优先级0, 最高优先级, 匹配对象:所有  动作:查询路由表local(路由表ID 255)

              Rule 0 is special. It cannot be deleted or overridden.

              也就是说用户空间不能操作local表, broadcast(广播)和local(本地回环)都是用户设置IP的时候由内核来修改local表

       2.     Priority: 32766, Selector: match anything, Action: lookup routing table main (ID 254).  The  main  table
              is the normal routing table containing all non-policy routes. This rule may be deleted and/or overridden
              with other ones by the administrator.

              优先级32766, 匹配对象:所有  动作:查询路由表main(路由表ID 254)
              main路由表是个普通的路由表,包含了所有没有策略规则的路由, 这条规则可以通过管理员删除或覆盖

       3.     Priority: 32767, Selector: match anything, Action: lookup routing table default (ID 253).   The  default
              table  is  empty.  It  is  reserved  for  some post-processing if no previous default rules selected the
              packet.  This rule may also be deleted.

              优先级32767, 匹配对象:所有  动作:查询路由表default(路由表ID 255)

       Each RPDB entry has additional attributes. F.e. each rule has a pointer to some routing  table.  NAT  and  mas-
       querading  rules  have  an attribute to select new IP address to translate/masquerade. Besides that, rules have
       some optional attributes, which routes have, namely realms.  These values do not override  those  contained  in
       the routing tables. They are only used if the route did not select any attributes.
       每个RPDB条目额外的属性, 例如, 每条规则都会指向一些对应的路由表
       除此以外,规则中还有一些路由的可选属性,也就是realms(the realm to which this route is assigned路由被指派的作用域,看下面额外说明)
       规则中的这些值不会覆盖路由表中的相同值, 这些值只会在路由没有选择任何属性的时候被使用

       realms, 文档说明

       Realms in iproute2 are a way of clustering sets of routes into groups.
       Packets following each route will be considered part of the corresponding realm,
       this classification can then be used to do bandwidth throttling, apply filters (e.g. with iptables),
       track usage statistics, and more, realm-wise.


       The RPDB may contain rules of the following types:
       策略路由数据库包含以下规则类型(ip rule可处理route的类型少于route table的route类型?)

               unicast - the rule prescribes to return the route found in the routing table referenced by the rule.
               单播   规则将返回一条在路由表中被这条规则引用的路由
               blackhole - the rule prescribes to silently drop the packet.
               黑洞  规则用于silently丢弃网络包
               unreachable - the rule prescribes to generate a ’Network is unreachable’ error.
               不可达   规则生成一个网络不可达的错误
               prohibit - the rule prescribes to generate ’Communication is administratively prohibited’ error.
               禁止  规则用于生成一个“通信被禁止”的错误
               nat - the rule prescribes to translate the source address of the IP packet into some other value.
               nat   规则用于将网络包的原地址转义为其他指定地址

机器上ip rule查询

[root@second ~]# ip rule show
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

ip rule使用

   ip rule [ list | add | del ] SELECTOR ACTION

   SELECTOR := [ from PREFIX ] [ to PREFIX ] [ tos TOS ] [ fwmark FWMARK ] [ dev STRING ] [ pref NUMBER ]

   ACTION := [ table TABLE_ID ] [ nat ADDRESS ] [ prohibit | reject | unreachable ] [ realms [SRCREALM/]DSTREALM ]

   TABLE_ID := [ local | main | default | NUMBER ]

分割线,openstack只有一个地方用了ip rule



Full network address translation, as performed with iproute2 can be
simulated with both netfilter SNAT and DNAT,
with the potential benefit (and attendent resource consumption) of connection tracking

NAT introduces a complexity to the network in which
it is used because a service is reachable on a public and a private IP.

rule中nat和route table中nat的区别,文档地址


# local路由表
[root@localhost ~]# ip route show table local
broadcast dev eth0  proto kernel  scope link  src
broadcast dev lo  proto kernel  scope link  src
broadcast dev lo  proto kernel  scope link  src
broadcast dev eth0  proto kernel  scope link  src
local dev eth0  proto kernel  scope host  src
local dev lo  proto kernel  scope host  src
local dev lo  proto kernel  scope host  src
# main路由表
[root@dyb-mszl185-Center ~]# ip route show table main dev eth0  proto kernel  scope link  src
default via dev eth0

[root@openstack ~]# ip netns exec snat-0c7df318-411b-4172-92a2-0227c6d85584 ip route
default via dev qg-7c9d90f1-b0 dev qg-7c9d90f1-b0  proto kernel  scope link  src dev qg-7c9d90f1-b0  scope link