Writing endian-independent code in C(Don't let endianness "byte" you)_reads a 16 bit, unsigned value-程序员宅基地

技术标签: character  c  byte  integer  protocols  network  

文章出处:http://www.ibm.com/developerworks/aix/library/au-endianc/index.html?ca=drs

 

Introduction

To understand the concept of endianness, you need to be familiar, at a highly abstract level, with memory. All you need to know about memory is that it's one large array. The array contains bytes. In the computer world, people useaddress to refer to the array locations.

Each address stores one element  of the memory array. Each elements is typically one byte. In some memory configurations, each address stores something besides a byte.However, those are extremely rare so, for now, let's make the broad assumption that all memory addresses store bytes.

 

Storing bytes in memory

I refer to 32 bits, which is the same as four bytes. Integer or single-precision floating point numbers are all 32-bits long. But since each memory address can store a single byte and not four bytes, let's split the 32-bit quantity into four bytes. For example, suppose you have 32-bit quantity written as 12345678, which is hexadecimal. Since each hex digit is four bits, you need eight hex digits to represent the 32-bit value. The four bytes are: 12, 34, 56, and 78. There are two ways to store this in memory, as shown below.

  • Big-endian:  Store the most significant byte in the smallest address
  • Little-endian:  Store the least significant byte in the smallest address

Notice that they are in the reverse order. To remember which is which,think of the least significant byte being stored first ( little-endian ), or the most significant byte being stored first ( big-endian ).

 

Registers and endianness

Endianness only makes sense when you're breaking up a multi-byte quantity and are trying to store the bytes at consecutive memory locations. However, if you have a 32-bit register storing a 32-bit value, it makes no sense to talk about endianness. The register is neither big-endian nor little-endian; it's just a register holding a 32-bit value.The rightmost bit is the least significant bit, and the leftmost bit is the most significant bit.

Some pepole classify a register as a big-endian, because it stores its most significant byte at the lowest memory address.

 

Importance of endianness

Endianness is the attribute of a system that indicates  whether integers are represented from left to right or right to left. In today's world of virtual machines and gigahertz processors, why would a programmer care about such a minor topic? Unfortunately, endianness must be chosen every time a hardware or software architecture is designed. There isn't much in the way of natural law to help decide, so implementaions vary.

 

All processors must be designated as either big-endian or little-endian. For example, the 80x86 processors from Intel and their clones are little-endian, while Sun's SPARC, Motorolas'68K, and the PowerPC families are all big-endian.  

Why is endianness so important? Suppose you are storing integer values to a file, and you send the file to a machine that uses the opposite endianness as it reads in the value. This causes porblems because of endianness; you'll read in reversed values that won't make sense.

Endianness is also a big issue when sending numbers over the network. Again, if you send a value from a machine of one endianness to a machine of the opposite endianness, you'll have problem. This is even worse over the network because you might not be able to determine the endianness of the machine that sent you the data.

Listing 1 shows an example of the dangers of programming while unaware of endianness.

Listing 1.  Example

#include <stdio.h>
#include <string.h>

int main (int argc, char* argv[]) {
    FILE* fp;

    /* Our example data structure */
    struct {
        char one[4];
        int  two;
        char three[4];
    } data;

    /* Fill our structure with data */
    strcpy (data.one, "foo");
    data.two = 0x01234567;
    strcpy (data.three, "bar");

    /* Write it to a file */
    fp = fopen ("output", "wb");
    if (fp) {
        fwrite (&data, sizeof (data), 1, fp);
        fclose (fp);
    }
}


 The above code complies properly on all machines. However, the ouput is different on big-endian and little-endian machines. The program outputs, when examined using thehexdump utility, are shown below in Listings 2 and 3.

Listing 2. hexdump-C output on big-endian machines

00000000        66 6f 6f 00       12 34 56 78        62 61 72 00        |foo..4Vxbar.|

0000000c

 

Listing 3. hexdump-C output on little-endian machines

00000000        66 6f 6f 00        78 56 34 12        62 61 72 00        |foo.xV4.bar.|

0000000c

 

When endianness affects code

Endianness doesn't apply to everything. If you do bitwise or bitshift operations on an int, you don't notice the endianness. The machine arranges the mutiple bytes, so the least significant byte is still the least significant byte, and the most significant byte is still the most significant byte.

Similarly, it's natural to wonder whether strings might be saved in some sort of strange order, depending on the machine. To understand this, let's go back to the basics of an array. A C-style string, after all, is still an array of characters. Each character requires one byte of memory, since characters are represented in ASCII. In an array, the address of consecutive array elements increases. Thus, &arr[i] is less than &arr[i+1]. Though it isn't obvious, if something is stored with increasing addresses in memory, it's going to be stored with increasing "address" in a file. When you write to a file, you usually specify an address in memory and the number of bytes you wish to write to the file starting at the address.

For example, suppose you have a C-style string in memory called man. Assume that m is stored at address 1000, a at address 1001, and n at address 1002. The null character \0 is at address 1003. Since C-style strings are arrays of characters, they follow the rules of characters. Unlike int or long, you can easily see the individual bytes of a C-style string, one byte at a time. You use array indexing to access the bytes( characters ) of the string. But you can't easily index the bytes of an int of long without playing some pointer tricks. The individual bytes of an int are more less hidden from you.

Now imagine writing out this string to a file using some sort of write() method. You specify a pointer to m and the number of bytes you wish to print (in the case four). The write() method proceeds byte by byte in the character string and writes it to the file, starting with m and working to the null character.

Given this explanation, it's clear that endianness doesn't matter with C-style strings.

Endianness does matter when you use a type cast that depends on a certain endian being in use. One example is shown in Listing 4, but keep in mind that there are many different type casts that can cause problems.

Listing 4. Forcing a byte order

unsigned char endian[2] = {1, 0};
short x;
  
x = *(short *) endian;


What would bt the value of x? Let's look at what this code is doing. You're creating an array of two bytes, and then casting that array of two bytes into a single short.By using an array, you basically forced a certain byte order, and you're going to see how the system treats those two bytes,

If this is a little-endian system, the 0 and 1 is interpreted backwards and seen as if it is 0, 1. Since the high byte is 0, it doesn't matter and the low byte is 1, so x is equal to 1.

On the other hand, if it's a big-endian system, the high byte is 1 and the value of x is 256.

 

Determine endianness at run time

One way to determine endianness is to test the memory layout of a predefined constant. For example, you know that the layout of a 32-bit integer variable with a value of 1 is 00 00 00 01 for big-endian and 01 00 00 00 for little-endin. By looking at the first byte of the constant, you can tell the endianness of the running platform and then take the appropriate action.

Listing 5 tests the first byte of the muti-byte integer i to determine if it is 0 or 1. If it is 1, the running platform is assumed to be little-endian. If it is 0, it is assumed to be big-endian.

Listing 5. Determing endianness

const int i = 1;
#define is_bigendian() ( (*(char*)&i) == 0 )

int main(void) {
    int val;
    char *ptr;
    ptr = (char*) &val;
    val = 0x12345678;
    if (is_bigendian()) {
        printf(“%X.%X.%X.%X\n", u.c[0], u.c[1], u.c[2], u.c[3]);
    } else {
        printf(“%X.%X.%X.%X\n", u.c[3], u.c[2], u.c[1], u.c[0]);
    }
    exit(0);
}


Another way to determine endianness is to use a character pointer to the bytes of an int and then check its first byte to see if it is 0 or 1.Listing 6 shows an example.

Listing 6. Character pointer

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN    1

int endian() {
    int i = 1;
    char *p = (char *)&i;

    if (p[0] == 1)
        return LITTLE_ENDIAN;
    else
        return BIG_ENDIAN;
}


Network byte order

Network stacks and communication protocols must also define their endianness. Otherwise, two nodes of different endianness would be unable to communicate. This is a more substantial example of endianness affecting the embedded programmer. All of the protocol layers in the Transmission Control Protocol and the Internet Protocol ( TCP/IP ) suite are defined to be big-endian. Any 16-bit or 32-bit value within the various layer headers ( such as a IP address, a packet length, or a checksum )must be sent and received with its most significant byte first.

The multi-byte integer representation used by the TCP/IP protocols is sometimes callednetwork byte order. Even if the computers at each end are little-endian, multi-byte integers passed between them must be converted to network byte order prior to transmission across the network and converted back to little-endian at the receiving end.

Assume you want to establish a TCP socket connection to a computer whose IP address is 192.0.1.2. Internet Protocol version 4 ( IPv4 ) uses a unique 32-bit integer to identify each network host. The dotted decimal IP address must be translated into such an integer.

Suppose an 80x86-based PC is to talk to a SPARC-based server over the Internet. Without further manipulation, the 80x86 processor would convert 192.0.1.2 to the little-endian interger 0x020100C0 and transmit the bytes in the order 02 01 00 C0. The SPARC would receive the bytes in the order 02 01 00 C0, reconstruct the bytes into a big endian integer 0x020100C0, and misinterpret the address as 2.1.0.192.

If the stack runs on a little-endian processor, it has to reorder, at run time, the bytes of every multi-byte data field within the various headers of the layers.If the stack runs on a big-endian processor, there's nothing to worry about. For the stack to be portable ( so it runs on processors of both types ), it has to decide whether or not to do this reordering, typically at compile time.

To enable these conversions, sockets provides a set of macros to convert to and from host to network byte order, as shown below.

 

htons()
Reorder the bytes of a 16-bit unsigned value from processor order to network order. The macro name can be read as "host to network short."
htonl()
Reorder the bytes of a 32-bit unsigned value from processor order to network order. The macro name can be read as "host to network long."
ntohs()
Reorder the bytes of a 16-bit unsigned value from network order to processor order. The macro name can be read as "network to host short."
ntohl()
Reorder the bytes of a 32-bit unsigned value from network order to processor order. The macro name can be read as "network to host long."

 

Consider the C program in Listing 7.

Listing 7. Sample C program

#include <stdio.h>
main() {
    int i;
    long x = 0x112A380; /* Value to play with */
    unsigned char *ptr = (char *) &x; /* Byte pointer */

    /* Observe value in host byte order */
    printf("x in hex: %x\n", x);
    printf("x by bytes: ");

    for (i=0; i < sizeof(long); i++)
        printf("%x\t", ptr[i]);
    printf("\n");

    /* Observe value in network byte order */
    x = htonl(x);
    printf("\nAfter htonl()\n");
    printf("x in hex: %x\n", x);
    printf("x by bytes: ");

    for (i=0; i < sizeof(long); i++)
        printf("%x\t", ptr[i]);
    printf("\n");
}

This program shows how the long variable x with the value 112A380 ( hexadecimal ) is stored.

When this program is executed on a little-endian processor, it outputs the information in  Listing 8.


Listing 8. Little-endian output

x in hex: 112a380

x by bytes: 80 a3 12 1

After htonl()

x in hex: 80a31201

x by bytes: 1 12 a3 80

 

When you look at the individual bytes of x, you find the least significant byte ( 0x80 ) in the lowest address. After you call htonl() to convert to network  byte order, you get the most significant byte ( 0x1 ) in the lowest address. Of course, if you try to print the Listing 9 shows the output from executing the same program on a big-endian processor.

 

Listing 9. Big-endian output

x in hex: 112a380
x by bytes: 1 12 a3 80
After htonl()
x in hex: 112a380
x by bytes: 1 12 a3 80

Here you see the most significant byte ( 0x1 ) in the lowest address. Calling htonl() to convert to network byte order does not change anything because network byte order is already big-endian.

 

Reversing the byte order

Now let's get down to writing some code that is not bound to a certain endian. There are many ways of doing this. The goal is to write code that doesn't fail, regardless of the endianness of the machine. You need to ensure that the file data is in the correct endian when read from or written to. It would also be nice to avoid having to specify conditional compilation flags and just let the code automatically determine the endianness of the machine.

Let's write a set of functions that automatically reverse the byte order of a given parameter, depending on the endian of the machine.

First, you need to deal with short by taking apart the two bytes of the short parameter s with some simple bit math and then gluing them back together in reverse order. As shown in Listing 10below, the function finally returns reversed short if the processor is little-endian. Otherwise, it simply returns s.

Listing 10. Method 1: Using bit shifting and bits ANDs

short reverseShort (short s) {
    unsigned char c1, c2;
    
    if (is_bigendian()) {
        return s;
    } else {
        c1 = s & 255;
        c2 = (s >> 8) & 255;
    
        return (c1 << 8) + c2;
    }
}

 

In the function below, you cast short to see it as an array of characters, and then assign each byte to a new array in the reverse order, if the processor is little-endian.

Listing 11.  Method 2: Using pointer to an array of characters

short reverseShort (char *c) {
    short s;
    char *p = (char *)&s;

    if (is_bigendian()) {
        p[0] = c[0];
        p[1] = c[1];
    } else {
        p[0] = c[1];
        p[1] = c[0];
    }

    return s;
}

 

Now let's handle int.

Listing 12.  Method 1: Using bit shifting and bit ANDs with int

int reverseInt (int i) {
    unsigned char c1, c2, c3, c4;

    if (is_bigendian()) {
        return i;
    } else {
        c1 = i & 255;
        c2 = (i >> 8) & 255;
        c3 = (i >> 16) & 255;
        c4 = (i >> 24) & 255;

        return ((int)c1 << 24) + ((int)c2 << 16) + ((int)c3 << 8) + c4;
}

This is more or less the same thing you did to reverse a short, but it switches around four bytes instead of two.

Listing 13.  Method 2: Using pointer to an array of characters with int

short reverseInt (char *c) {
    int i;
    char *p = (char *)&i;

    if (is_bigendian()) {
        p[0] = c[0];
        p[1] = c[1];
        p[2] = c[2];
        p[3] = c[3];
    } else {
        p[0] = c[3];
        p[1] = c[2];
        p[2] = c[1];
        p[3] = c[0];
    }

    return i;
}

Again, this is exactly what you did to reverse a short, but here you swapped four bytes.

Similarly, you can write code to reverse bytes of float, long, double, and so on, but that is outside the scope of this article.

 

Conclusion

There seems to be no significant advantage in using one method of endianness over the other, Both are still common and different architectures use them, Little-endian based processors ( and their clones ) are used in most personal computers and laptops, so the vast majority of desktop computers today are little-endian.

Endian issues do not affect sequences that have single bytes, because "byte" is considered an atomic unit from a storage point of view. On the other hand, sequences based on multi-byte are affected by endianness and you need to take care while coding.

 


 

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/ixidof/article/details/7831214

智能推荐

oracle 12c 集群安装后的检查_12c查看crs状态-程序员宅基地

文章浏览阅读1.6k次。安装配置gi、安装数据库软件、dbca建库见下:http://blog.csdn.net/kadwf123/article/details/784299611、检查集群节点及状态:[root@rac2 ~]# olsnodes -srac1 Activerac2 Activerac3 Activerac4 Active[root@rac2 ~]_12c查看crs状态

解决jupyter notebook无法找到虚拟环境的问题_jupyter没有pytorch环境-程序员宅基地

文章浏览阅读1.3w次,点赞45次,收藏99次。我个人用的是anaconda3的一个python集成环境,自带jupyter notebook,但在我打开jupyter notebook界面后,却找不到对应的虚拟环境,原来是jupyter notebook只是通用于下载anaconda时自带的环境,其他环境要想使用必须手动下载一些库:1.首先进入到自己创建的虚拟环境(pytorch是虚拟环境的名字)activate pytorch2.在该环境下下载这个库conda install ipykernelconda install nb__jupyter没有pytorch环境

国内安装scoop的保姆教程_scoop-cn-程序员宅基地

文章浏览阅读5.2k次,点赞19次,收藏28次。选择scoop纯属意外,也是无奈,因为电脑用户被锁了管理员权限,所有exe安装程序都无法安装,只可以用绿色软件,最后被我发现scoop,省去了到处下载XXX绿色版的烦恼,当然scoop里需要管理员权限的软件也跟我无缘了(譬如everything)。推荐添加dorado这个bucket镜像,里面很多中文软件,但是部分国外的软件下载地址在github,可能无法下载。以上两个是官方bucket的国内镜像,所有软件建议优先从这里下载。上面可以看到很多bucket以及软件数。如果官网登陆不了可以试一下以下方式。_scoop-cn

Element ui colorpicker在Vue中的使用_vue el-color-picker-程序员宅基地

文章浏览阅读4.5k次,点赞2次,收藏3次。首先要有一个color-picker组件 <el-color-picker v-model="headcolor"></el-color-picker>在data里面data() { return {headcolor: ’ #278add ’ //这里可以选择一个默认的颜色} }然后在你想要改变颜色的地方用v-bind绑定就好了,例如:这里的:sty..._vue el-color-picker

迅为iTOP-4412精英版之烧写内核移植后的镜像_exynos 4412 刷机-程序员宅基地

文章浏览阅读640次。基于芯片日益增长的问题,所以内核开发者们引入了新的方法,就是在内核中只保留函数,而数据则不包含,由用户(应用程序员)自己把数据按照规定的格式编写,并放在约定的地方,为了不占用过多的内存,还要求数据以根精简的方式编写。boot启动时,传参给内核,告诉内核设备树文件和kernel的位置,内核启动时根据地址去找到设备树文件,再利用专用的编译器去反编译dtb文件,将dtb还原成数据结构,以供驱动的函数去调用。firmware是三星的一个固件的设备信息,因为找不到固件,所以内核启动不成功。_exynos 4412 刷机

Linux系统配置jdk_linux配置jdk-程序员宅基地

文章浏览阅读2w次,点赞24次,收藏42次。Linux系统配置jdkLinux学习教程,Linux入门教程(超详细)_linux配置jdk

随便推点

matlab(4):特殊符号的输入_matlab微米怎么输入-程序员宅基地

文章浏览阅读3.3k次,点赞5次,收藏19次。xlabel('\delta');ylabel('AUC');具体符号的对照表参照下图:_matlab微米怎么输入

C语言程序设计-文件(打开与关闭、顺序、二进制读写)-程序员宅基地

文章浏览阅读119次。顺序读写指的是按照文件中数据的顺序进行读取或写入。对于文本文件,可以使用fgets、fputs、fscanf、fprintf等函数进行顺序读写。在C语言中,对文件的操作通常涉及文件的打开、读写以及关闭。文件的打开使用fopen函数,而关闭则使用fclose函数。在C语言中,可以使用fread和fwrite函数进行二进制读写。‍ Biaoge 于2024-03-09 23:51发布 阅读量:7 ️文章类型:【 C语言程序设计 】在C语言中,用于打开文件的函数是____,用于关闭文件的函数是____。

Touchdesigner自学笔记之三_touchdesigner怎么让一个模型跟着鼠标移动-程序员宅基地

文章浏览阅读3.4k次,点赞2次,收藏13次。跟随鼠标移动的粒子以grid(SOP)为partical(SOP)的资源模板,调整后连接【Geo组合+point spirit(MAT)】,在连接【feedback组合】适当调整。影响粒子动态的节点【metaball(SOP)+force(SOP)】添加mouse in(CHOP)鼠标位置到metaball的坐标,实现鼠标影响。..._touchdesigner怎么让一个模型跟着鼠标移动

【附源码】基于java的校园停车场管理系统的设计与实现61m0e9计算机毕设SSM_基于java技术的停车场管理系统实现与设计-程序员宅基地

文章浏览阅读178次。项目运行环境配置:Jdk1.8 + Tomcat7.0 + Mysql + HBuilderX(Webstorm也行)+ Eclispe(IntelliJ IDEA,Eclispe,MyEclispe,Sts都支持)。项目技术:Springboot + mybatis + Maven +mysql5.7或8.0+html+css+js等等组成,B/S模式 + Maven管理等等。环境需要1.运行环境:最好是java jdk 1.8,我们在这个平台上运行的。其他版本理论上也可以。_基于java技术的停车场管理系统实现与设计

Android系统播放器MediaPlayer源码分析_android多媒体播放源码分析 时序图-程序员宅基地

文章浏览阅读3.5k次。前言对于MediaPlayer播放器的源码分析内容相对来说比较多,会从Java-&amp;amp;gt;Jni-&amp;amp;gt;C/C++慢慢分析,后面会慢慢更新。另外,博客只作为自己学习记录的一种方式,对于其他的不过多的评论。MediaPlayerDemopublic class MainActivity extends AppCompatActivity implements SurfaceHolder.Cal..._android多媒体播放源码分析 时序图

java 数据结构与算法 ——快速排序法-程序员宅基地

文章浏览阅读2.4k次,点赞41次,收藏13次。java 数据结构与算法 ——快速排序法_快速排序法