### Software Versions and Tools JDK8 Spark-2.4.3: [Download](https://blog-1310034074.cos.ap-hongkong.myqcloud.com/BigData/spark-2.4.3-bin-hadoop2.7.tgz) Hadoop-2.7.1: [Download](https://blog-1310034074.cos.ap-hongkong.myqcloud.com/BigData/hadoop-2.7.1.tar.gz) winutils-master: [Download](https://blog-1310034074.cos.ap-hongkong.myqcloud.com/BigData/winutils-master.zip) --- ### Installation Steps **1. Install Hadoop** Unzip the *winutils* and *Hadoop* compressed packages. When using IDEA to develop **spark** programs, you need to simulate the *Hadoop* environment in the development environment. Otherwise, every time you need to hit the jar to the cluster environment to execute the debugging program, which will seriously affect the development efficiency. winutils is the Hadoop debugging environment tool required on Windows system, which contains some essential tools needed to debug Hadoop and Spark on Windows. Enter the winutils directory, copy and paste all its contents into the Hadoop installation directory's bin directory, and add or replace some files. Right-click *My Computer - Properties - Advanced System Settings - Environment Variables*, create a new system variable, set the variable name to **HADOOP_HOME**, and the variable value is the file directory address replaced in the previous step.  Find the **Path** variable, double-click it to open the edit dialogue, select New Variable, and point to the bin directory of Hadoop.  Open the **etc** directory under the Hadoop folder, modify the **hadoop.env.cmd** file, and change JAVA_HOME to the address pointed to by the system variable.  --- **2.Install Python** As Hadoop2.7 and Spark 2.4 can only use Python3.6, we use **Anaconda** to build the Python environment. After installing Anaconda, open Anaconda Navigator, select **Environments** and create a new Python3.6.13 environment.  --- **3.Install Spark** Unzip Spark and put it in the same directory as Hadoop, and configure environment variables. Copy **PySpark** in the Spark directory to the Lib directory in the Python environment.  Enter the Scripts directory in the Python environment, and use *pip install py4j* to install **py4j**. Py4J is a library written in Python and Java. Through Py4J, Python programs can dynamically access Java objects in the Java virtual machine, and Java programs can also call back Python objects.  Open cmd and enter *spark-shell*. At this point, the Spark configuration is successful. --- **4.Running PySpark** Write a test code in **Spyder.** Here, the wordcount program that counts the number of occurrences of words is used. After writing, save it for easy operation. ```python """ @author: JackyMu """ from pyspark import SparkConf, SparkContext conf = SparkConf().setAppName("WordCount").setMaster("local") sc = SparkContext(conf=conf) inputFile = "" #file location textFile = sc.textFile(inputFile) wordCount = textFile.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b) wordCount.foreach(print) ``` Open the **Anaconda prompt**, enter activate python36 on the command line to activate the python environment, use the command to run the program file saved in the previous step.  --- ### Summary This blog briefly introduces the installation and configuration of PySpark under windows. When executing tasks, you can also enter localhost:4040 in the browser to enter the task program control page started by spark. Loading... ### Software Versions and Tools JDK8 Spark-2.4.3: [Download](https://blog-1310034074.cos.ap-hongkong.myqcloud.com/BigData/spark-2.4.3-bin-hadoop2.7.tgz) Hadoop-2.7.1: [Download](https://blog-1310034074.cos.ap-hongkong.myqcloud.com/BigData/hadoop-2.7.1.tar.gz) winutils-master: [Download](https://blog-1310034074.cos.ap-hongkong.myqcloud.com/BigData/winutils-master.zip) --- ### Installation Steps **1. Install Hadoop** Unzip the *winutils* and *Hadoop* compressed packages. When using IDEA to develop **spark** programs, you need to simulate the *Hadoop* environment in the development environment. Otherwise, every time you need to hit the jar to the cluster environment to execute the debugging program, which will seriously affect the development efficiency. winutils is the Hadoop debugging environment tool required on Windows system, which contains some essential tools needed to debug Hadoop and Spark on Windows. <!-- more --> Enter the winutils directory, copy and paste all its contents into the Hadoop installation directory's bin directory, and add or replace some files. Right-click *My Computer - Properties - Advanced System Settings - Environment Variables*, create a new system variable, set the variable name to **HADOOP_HOME**, and the variable value is the file directory address replaced in the previous step.  Find the **Path** variable, double-click it to open the edit dialogue, select New Variable, and point to the bin directory of Hadoop.  Open the **etc** directory under the Hadoop folder, modify the **hadoop.env.cmd** file, and change JAVA_HOME to the address pointed to by the system variable.  --- **2.Install Python** As Hadoop2.7 and Spark 2.4 can only use Python3.6, we use **Anaconda** to build the Python environment. After installing Anaconda, open Anaconda Navigator, select **Environments** and create a new Python3.6.13 environment.  --- **3.Install Spark** Unzip Spark and put it in the same directory as Hadoop, and configure environment variables. Copy **PySpark** in the Spark directory to the Lib directory in the Python environment.  Enter the Scripts directory in the Python environment, and use *pip install py4j* to install **py4j**. Py4J is a library written in Python and Java. Through Py4J, Python programs can dynamically access Java objects in the Java virtual machine, and Java programs can also call back Python objects.  Open cmd and enter *spark-shell*. At this point, the Spark configuration is successful. --- **4.Running PySpark** Write a test code in **Spyder.** Here, the wordcount program that counts the number of occurrences of words is used. After writing, save it for easy operation. ```python """ @author: JackyMu """ from pyspark import SparkConf, SparkContext conf = SparkConf().setAppName("WordCount").setMaster("local") sc = SparkContext(conf=conf) inputFile = "" #file location textFile = sc.textFile(inputFile) wordCount = textFile.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b) wordCount.foreach(print) ``` Open the **Anaconda prompt**, enter activate python36 on the command line to activate the python environment, use the command to run the program file saved in the previous step.  --- ### Summary This blog briefly introduces the installation and configuration of PySpark under windows. When executing tasks, you can also enter <u>localhost:4040</u> in the browser to enter the task program control page started by spark. Last modification:March 28, 2024 © Allow specification reprint Like 1 If you think my article is useful to you, please feel free to appreciate
11 comments
2025年10月新盘 做第一批吃螃蟹的人coinsrore.com
新车新盘 嘎嘎稳 嘎嘎靠谱coinsrore.com
新车首发,新的一年,只带想赚米的人coinsrore.com
新盘 上车集合 留下 我要发发 立马进裙coinsrore.com
做了几十年的项目 我总结了最好的一个盘(纯干货)coinsrore.com
新车上路,只带前10个人coinsrore.com
新盘首开 新盘首开 征召客户!!!coinsrore.com
新项目准备上线,寻找志同道合 的合作伙伴coinsrore.com
新车即将上线 真正的项目,期待你的参与coinsrore.com
新盘新项目,不再等待,现在就是最佳上车机会!coinsrore.com
新盘新盘 这个月刚上新盘 新车第一个吃螃蟹!coinsrore.com
2025年10月新盘 做第一批吃螃蟹的人coinsrore.com
新车新盘 嘎嘎稳 嘎嘎靠谱coinsrore.com
新车首发,新的一年,只带想赚米的人coinsrore.com
新盘 上车集合 留下 我要发发 立马进裙coinsrore.com
做了几十年的项目 我总结了最好的一个盘(纯干货)coinsrore.com
新车上路,只带前10个人coinsrore.com
新盘首开 新盘首开 征召客户!!!coinsrore.com
新项目准备上线,寻找志同道合 的合作伙伴coinsrore.com
新车即将上线 真正的项目,期待你的参与coinsrore.com
新盘新项目,不再等待,现在就是最佳上车机会!coinsrore.com
新盘新盘 这个月刚上新盘 新车第一个吃螃蟹!coinsrore.com
2025年10月新盘 做第一批吃螃蟹的人coinsrore.com
新车新盘 嘎嘎稳 嘎嘎靠谱coinsrore.com
新车首发,新的一年,只带想赚米的人coinsrore.com
新盘 上车集合 留下 我要发发 立马进裙coinsrore.com
做了几十年的项目 我总结了最好的一个盘(纯干货)coinsrore.com
新车上路,只带前10个人coinsrore.com
新盘首开 新盘首开 征召客户!!!coinsrore.com
新项目准备上线,寻找志同道合的合作伙伴coinsrore.com
新车即将上线 真正的项目,期待你的参与coinsrore.com
新盘新项目,不再等待,现在就是最佳上车机会!coinsrore.com
新盘新盘 这个月刚上新盘 新车第一个吃螃蟹!coinsrore.com
新项目准备上线,寻找志同道合的合作伙伴
新项目准备上线,寻找志同道合的合作伙伴coinsrore.com
文章结构紧凑,层次分明,逻辑严密,让人一读即懂。
对生命本质的追问充满哲学思辨。
不错不错,我喜欢看 www.jiwenlaw.com
怎么收藏这篇文章?
看的我热血沸腾啊https://www.jiwenlaw.com/
博主真是太厉害了!!!